Overview

Dataset statistics

Number of variables49
Number of observations499
Missing cells4729
Missing cells (%)19.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory191.1 KiB
Average record size in memory392.3 B

Variable types

Numeric11
Categorical38

Alerts

sm_aware has a high cardinality: 65 distinct values High cardinality
sm_data_use has a high cardinality: 162 distinct values High cardinality
ethic_appr has a high cardinality: 498 distinct values High cardinality
study_1_conc has a high cardinality: 270 distinct values High cardinality
study_1_add_info has a high cardinality: 107 distinct values High cardinality
study_2_conc has a high cardinality: 283 distinct values High cardinality
study_2_add_info has a high cardinality: 109 distinct values High cardinality
study_3_conc has a high cardinality: 247 distinct values High cardinality
study_3_add_info has a high cardinality: 92 distinct values High cardinality
study_4_conc has a high cardinality: 267 distinct values High cardinality
study_4_add_info has a high cardinality: 104 distinct values High cardinality
design_add_fac has a high cardinality: 261 distinct values High cardinality
rank_add_fac_1 has a high cardinality: 118 distinct values High cardinality
lat is highly correlated with long and 3 other fieldsHigh correlation
long is highly correlated with lat and 3 other fieldsHigh correlation
study_3_add_info is highly correlated with lat and 18 other fieldsHigh correlation
rank_add_fac_1_pos is highly correlated with sm_aware and 3 other fieldsHigh correlation
rank_add_fac_2 is highly correlated with lat and 17 other fieldsHigh correlation
rank_add_fac_2_pos is highly correlated with politic_pref and 7 other fieldsHigh correlation
rank_add_fac_3 is highly correlated with lat and 14 other fieldsHigh correlation
rank_add_fac_3_pos is highly correlated with politic_pref and 6 other fieldsHigh correlation
sm_use is highly correlated with study_3_add_info and 1 other fieldsHigh correlation
age is highly correlated with study_3_add_info and 1 other fieldsHigh correlation
gender_id is highly correlated with ethnic_id and 3 other fieldsHigh correlation
ethnic_id is highly correlated with gender_id and 1 other fieldsHigh correlation
edu is highly correlated with study_3_add_infoHigh correlation
politic_pref is highly correlated with ethnic_id and 3 other fieldsHigh correlation
sm_aware is highly correlated with sm_expmt_inerct and 3 other fieldsHigh correlation
sm_expmt_inerct is highly correlated with sm_aware and 2 other fieldsHigh correlation
study_1_ethic_acc is highly correlated with study_2_ethic_acc and 4 other fieldsHigh correlation
study_2_ethic_acc is highly correlated with study_1_ethic_acc and 2 other fieldsHigh correlation
study_3_ethic_acc is highly correlated with rank_add_fac_2 and 1 other fieldsHigh correlation
study_4_ethic_acc is highly correlated with study_1_ethic_acc and 2 other fieldsHigh correlation
design_cont is highly correlated with study_3_add_info and 8 other fieldsHigh correlation
design_num_users is highly correlated with study_3_add_info and 6 other fieldsHigh correlation
design_res_purp is highly correlated with study_3_add_info and 8 other fieldsHigh correlation
design_len_data is highly correlated with study_3_add_info and 6 other fieldsHigh correlation
design_admin_inter is highly correlated with study_3_add_info and 7 other fieldsHigh correlation
design_inter_type is highly correlated with study_3_add_info and 4 other fieldsHigh correlation
design_partic_aware is highly correlated with study_1_ethic_acc and 3 other fieldsHigh correlation
design_inter_impact is highly correlated with study_3_add_info and 7 other fieldsHigh correlation
design_type_data is highly correlated with study_3_add_info and 9 other fieldsHigh correlation
rank_sci_repro is highly correlated with rank_just and 1 other fieldsHigh correlation
rank_resp is highly correlated with rank_harms and 2 other fieldsHigh correlation
rank_just is highly correlated with rank_sci_repro and 1 other fieldsHigh correlation
rank_anony is highly correlated with rank_add_fac_2_posHigh correlation
rank_harms is highly correlated with study_3_add_info and 4 other fieldsHigh correlation
rank_balance is highly correlated with rank_sci_repro and 3 other fieldsHigh correlation
study_1_conc has 204 (40.9%) missing values Missing
study_1_add_info has 353 (70.7%) missing values Missing
study_2_conc has 191 (38.3%) missing values Missing
study_2_add_info has 355 (71.1%) missing values Missing
study_3_conc has 225 (45.1%) missing values Missing
study_3_add_info has 371 (74.3%) missing values Missing
study_4_conc has 203 (40.7%) missing values Missing
study_4_add_info has 355 (71.1%) missing values Missing
design_add_fac has 103 (20.6%) missing values Missing
rank_add_fac_1 has 351 (70.3%) missing values Missing
rank_add_fac_1_pos has 342 (68.5%) missing values Missing
rank_add_fac_2 has 431 (86.4%) missing values Missing
rank_add_fac_2_pos has 402 (80.6%) missing values Missing
rank_add_fac_3 has 435 (87.2%) missing values Missing
rank_add_fac_3_pos has 408 (81.8%) missing values Missing
df_index is uniformly distributed Uniform
ethic_appr is uniformly distributed Uniform
df_index has unique values Unique

Reproduction

Analysis started2022-11-16 16:17:50.519771
Analysis finished2022-11-16 16:18:11.658829
Duration21.14 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct499
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean250
Minimum1
Maximum499
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:11.727672image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile25.9
Q1125.5
median250
Q3374.5
95-th percentile474.1
Maximum499
Range498
Interquartile range (IQR)249

Descriptive statistics

Standard deviation144.1931575
Coefficient of variation (CV)0.57677263
Kurtosis-1.2
Mean250
Median Absolute Deviation (MAD)125
Skewness0
Sum124750
Variance20791.66667
MonotonicityStrictly increasing
2022-11-16T16:18:11.824464image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.2%
3291
 
0.2%
3421
 
0.2%
3411
 
0.2%
3401
 
0.2%
3391
 
0.2%
3381
 
0.2%
3371
 
0.2%
3361
 
0.2%
3351
 
0.2%
Other values (489)489
98.0%
ValueCountFrequency (%)
11
0.2%
21
0.2%
31
0.2%
41
0.2%
51
0.2%
61
0.2%
71
0.2%
81
0.2%
91
0.2%
101
0.2%
ValueCountFrequency (%)
4991
0.2%
4981
0.2%
4971
0.2%
4961
0.2%
4951
0.2%
4941
0.2%
4931
0.2%
4921
0.2%
4911
0.2%
4901
0.2%

lat
Real number (ℝ≥0)

HIGH CORRELATION

Distinct468
Distinct (%)93.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.4252477
Minimum25.4572
Maximum47.8978
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:11.920558image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum25.4572
5-th percentile28.03196
Q133.8946
median38.9507
Q341.1971
95-th percentile44.26215
Maximum47.8978
Range22.4406
Interquartile range (IQR)7.3025

Descriptive statistics

Standard deviation5.144001204
Coefficient of variation (CV)0.1374473523
Kurtosis-0.5901140424
Mean37.4252477
Median Absolute Deviation (MAD)3.411
Skewness-0.5037155839
Sum18675.1986
Variance26.46074839
MonotonicityNot monotonic
2022-11-16T16:18:12.010919image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
37.7514
 
0.8%
34.89653
 
0.6%
36.16713
 
0.6%
34.00072
 
0.4%
40.82752
 
0.4%
42.23142
 
0.4%
39.08052
 
0.4%
26.14812
 
0.4%
40.70352
 
0.4%
40.32262
 
0.4%
Other values (458)475
95.2%
ValueCountFrequency (%)
25.45721
0.2%
25.53331
0.2%
25.66391
0.2%
25.66661
0.2%
25.77381
0.2%
25.81191
0.2%
26.14812
0.4%
26.18581
0.2%
26.21341
0.2%
26.53671
0.2%
ValueCountFrequency (%)
47.89781
0.2%
47.89771
0.2%
47.69011
0.2%
47.66311
0.2%
47.60341
0.2%
47.42211
0.2%
47.11731
0.2%
46.83931
0.2%
46.46041
0.2%
46.15481
0.2%

long
Real number (ℝ)

HIGH CORRELATION

Distinct469
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-86.3084996
Minimum-123.0592
Maximum-70.3899
Zeros0
Zeros (%)0.0%
Negative499
Negative (%)100.0%
Memory size4.0 KiB
2022-11-16T16:18:12.102013image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-123.0592
5-th percentile-118.12652
Q1-90.0043
median-83.1895
Q3-78.54575
95-th percentile-73.07752
Maximum-70.3899
Range52.6693
Interquartile range (IQR)11.45855

Descriptive statistics

Standard deviation12.17469623
Coefficient of variation (CV)-0.1410602233
Kurtosis1.777511773
Mean-86.3084996
Median Absolute Deviation (MAD)5.7951
Skewness-1.450253531
Sum-43067.9413
Variance148.2232283
MonotonicityNot monotonic
2022-11-16T16:18:12.200043image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-97.8224
 
0.8%
-76.88693
 
0.6%
-86.78613
 
0.6%
-85.70982
 
0.4%
-73.12252
 
0.4%
-84.45592
 
0.4%
-80.20882
 
0.4%
-73.92352
 
0.4%
-76.40422
 
0.4%
-81.03482
 
0.4%
Other values (459)475
95.2%
ValueCountFrequency (%)
-123.05921
0.2%
-123.04611
0.2%
-122.88651
0.2%
-122.64171
0.2%
-122.50911
0.2%
-122.37471
0.2%
-122.34141
0.2%
-122.30291
0.2%
-122.06931
0.2%
-122.01821
0.2%
ValueCountFrequency (%)
-70.38991
0.2%
-70.49141
0.2%
-70.56271
0.2%
-70.84991
0.2%
-70.94651
0.2%
-70.952
0.4%
-71.0541
0.2%
-71.07141
0.2%
-71.09511
0.2%
-71.18361
0.2%

sm_use
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Facebook
258 
Reddit
133 
Twitter
108 

Length

Max length8
Median length8
Mean length7.250501002
Min length6

Characters and Unicode

Total characters3618
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFacebook
2nd rowTwitter
3rd rowFacebook
4th rowFacebook
5th rowTwitter

Common Values

ValueCountFrequency (%)
Facebook258
51.7%
Reddit133
26.7%
Twitter108
21.6%

Length

2022-11-16T16:18:12.288385image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:12.366401image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
facebook258
51.7%
reddit133
26.7%
twitter108
21.6%

Most occurring characters

ValueCountFrequency (%)
o516
14.3%
e499
13.8%
t349
9.6%
d266
7.4%
F258
7.1%
a258
7.1%
c258
7.1%
b258
7.1%
k258
7.1%
i241
6.7%
Other values (4)457
12.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3119
86.2%
Uppercase Letter499
 
13.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o516
16.5%
e499
16.0%
t349
11.2%
d266
8.5%
a258
8.3%
c258
8.3%
b258
8.3%
k258
8.3%
i241
7.7%
w108
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
F258
51.7%
R133
26.7%
T108
21.6%

Most occurring scripts

ValueCountFrequency (%)
Latin3618
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o516
14.3%
e499
13.8%
t349
9.6%
d266
7.4%
F258
7.1%
a258
7.1%
c258
7.1%
b258
7.1%
k258
7.1%
i241
6.7%
Other values (4)457
12.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII3618
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o516
14.3%
e499
13.8%
t349
9.6%
d266
7.4%
F258
7.1%
a258
7.1%
c258
7.1%
b258
7.1%
k258
7.1%
i241
6.7%
Other values (4)457
12.6%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct60
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.66332665
Minimum18
Maximum78
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:12.444936image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile23
Q131
median39
Q351.5
95-th percentile67
Maximum78
Range60
Interquartile range (IQR)20.5

Descriptive statistics

Standard deviation13.63593166
Coefficient of variation (CV)0.3272885954
Kurtosis-0.5585557113
Mean41.66332665
Median Absolute Deviation (MAD)10
Skewness0.5655176939
Sum20790
Variance185.9386323
MonotonicityNot monotonic
2022-11-16T16:18:12.537868image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3522
 
4.4%
3420
 
4.0%
3719
 
3.8%
2918
 
3.6%
2718
 
3.6%
2617
 
3.4%
4415
 
3.0%
3115
 
3.0%
3815
 
3.0%
2314
 
2.8%
Other values (50)326
65.3%
ValueCountFrequency (%)
181
 
0.2%
194
 
0.8%
203
 
0.6%
212
 
0.4%
222
 
0.4%
2314
2.8%
248
1.6%
2510
2.0%
2617
3.4%
2718
3.6%
ValueCountFrequency (%)
781
 
0.2%
763
0.6%
751
 
0.2%
741
 
0.2%
732
 
0.4%
721
 
0.2%
712
 
0.4%
706
1.2%
693
0.6%
682
 
0.4%

gender_id
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Male
282 
Female
207 
Non-binary / third gender
 
8
Prefer not to say
 
2

Length

Max length25
Median length4
Mean length5.218436874
Min length4

Characters and Unicode

Total characters2604
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male282
56.5%
Female207
41.5%
Non-binary / third gender8
 
1.6%
Prefer not to say2
 
0.4%

Length

2022-11-16T16:18:12.622839image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:12.693878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
male282
53.3%
female207
39.1%
non-binary8
 
1.5%
8
 
1.5%
third8
 
1.5%
gender8
 
1.5%
prefer2
 
0.4%
not2
 
0.4%
to2
 
0.4%
say2
 
0.4%

Most occurring characters

ValueCountFrequency (%)
e716
27.5%
a499
19.2%
l489
18.8%
M282
 
10.8%
F207
 
7.9%
m207
 
7.9%
30
 
1.2%
r28
 
1.1%
n26
 
1.0%
d16
 
0.6%
Other values (13)104
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2059
79.1%
Uppercase Letter499
 
19.2%
Space Separator30
 
1.2%
Dash Punctuation8
 
0.3%
Other Punctuation8
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e716
34.8%
a499
24.2%
l489
23.7%
m207
 
10.1%
r28
 
1.4%
n26
 
1.3%
d16
 
0.8%
i16
 
0.8%
o12
 
0.6%
t12
 
0.6%
Other values (6)38
 
1.8%
Uppercase Letter
ValueCountFrequency (%)
M282
56.5%
F207
41.5%
N8
 
1.6%
P2
 
0.4%
Space Separator
ValueCountFrequency (%)
30
100.0%
Dash Punctuation
ValueCountFrequency (%)
-8
100.0%
Other Punctuation
ValueCountFrequency (%)
/8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2558
98.2%
Common46
 
1.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e716
28.0%
a499
19.5%
l489
19.1%
M282
 
11.0%
F207
 
8.1%
m207
 
8.1%
r28
 
1.1%
n26
 
1.0%
d16
 
0.6%
i16
 
0.6%
Other values (10)72
 
2.8%
Common
ValueCountFrequency (%)
30
65.2%
-8
 
17.4%
/8
 
17.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e716
27.5%
a499
19.2%
l489
18.8%
M282
 
10.8%
F207
 
7.9%
m207
 
7.9%
30
 
1.2%
r28
 
1.1%
n26
 
1.0%
d16
 
0.6%
Other values (13)104
 
4.0%

ethnic_id
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
White / Caucasian
397 
African-American
 
32
Mixed race
 
20
Hispanic
 
19
Asian - Eastern
 
16
Other values (7)
 
15

Length

Max length17
Median length17
Mean length16.15230461
Min length5

Characters and Unicode

Total characters8060
Distinct characters34
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)1.0%

Sample

1st rowAsian - Eastern
2nd rowMixed race
3rd rowPacific Islander
4th rowWhite / Caucasian
5th rowNative-American

Common Values

ValueCountFrequency (%)
White / Caucasian397
79.6%
African-American32
 
6.4%
Mixed race20
 
4.0%
Hispanic19
 
3.8%
Asian - Eastern16
 
3.2%
Asian - Indian7
 
1.4%
Native-American3
 
0.6%
Pacific Islander1
 
0.2%
Prefer not to say1
 
0.2%
Asian - Southeast1
 
0.2%
Other values (2)2
 
0.4%

Length

2022-11-16T16:18:12.983119image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
421
30.8%
white397
29.1%
caucasian397
29.1%
african-american32
 
2.3%
asian24
 
1.8%
mixed20
 
1.5%
race20
 
1.5%
hispanic19
 
1.4%
eastern16
 
1.2%
indian7
 
0.5%
Other values (10)12
 
0.9%

Most occurring characters

ValueCountFrequency (%)
a1353
16.8%
i956
11.9%
866
10.7%
n540
 
6.7%
c505
 
6.3%
e497
 
6.2%
s459
 
5.7%
t421
 
5.2%
h399
 
5.0%
C398
 
4.9%
Other values (24)1666
20.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5782
71.7%
Uppercase Letter956
 
11.9%
Space Separator866
 
10.7%
Other Punctuation397
 
4.9%
Dash Punctuation59
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1353
23.4%
i956
16.5%
n540
 
9.3%
c505
 
8.7%
e497
 
8.6%
s459
 
7.9%
t421
 
7.3%
h399
 
6.9%
u398
 
6.9%
r109
 
1.9%
Other values (10)145
 
2.5%
Uppercase Letter
ValueCountFrequency (%)
C398
41.6%
W397
41.5%
A91
 
9.5%
M20
 
2.1%
H19
 
2.0%
E16
 
1.7%
I8
 
0.8%
N3
 
0.3%
P2
 
0.2%
S1
 
0.1%
Space Separator
ValueCountFrequency (%)
866
100.0%
Other Punctuation
ValueCountFrequency (%)
/397
100.0%
Dash Punctuation
ValueCountFrequency (%)
-59
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6738
83.6%
Common1322
 
16.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1353
20.1%
i956
14.2%
n540
 
8.0%
c505
 
7.5%
e497
 
7.4%
s459
 
6.8%
t421
 
6.2%
h399
 
5.9%
C398
 
5.9%
u398
 
5.9%
Other values (21)812
12.1%
Common
ValueCountFrequency (%)
866
65.5%
/397
30.0%
-59
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII8060
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1353
16.8%
i956
11.9%
866
10.7%
n540
 
6.7%
c505
 
6.3%
e497
 
6.2%
s459
 
5.7%
t421
 
5.2%
h399
 
5.0%
C398
 
4.9%
Other values (24)1666
20.7%

edu
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Bachelor's degree
222 
Highschool
153 
Master's degree or above
87 
Associate's degree
 
22
Some college
 
7
Other values (2)
 
8

Length

Max length24
Median length19
Mean length16.06412826
Min length10

Characters and Unicode

Total characters8016
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHighschool
2nd rowHighschool
3rd rowBachelor's degree
4th rowHighschool
5th rowHighschool

Common Values

ValueCountFrequency (%)
Bachelor's degree222
44.5%
Highschool153
30.7%
Master's degree or above87
 
17.4%
Associate's degree22
 
4.4%
Some college7
 
1.4%
Prefer not to say4
 
0.8%
Vocational training4
 
0.8%

Length

2022-11-16T16:18:13.057801image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:13.140318image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
degree331
32.2%
bachelor's222
21.6%
highschool153
14.9%
master's87
 
8.5%
or87
 
8.5%
above87
 
8.5%
associate's22
 
2.1%
some7
 
0.7%
college7
 
0.7%
prefer4
 
0.4%
Other values (5)20
 
1.9%

Most occurring characters

ValueCountFrequency (%)
e1440
18.0%
o754
9.4%
r739
9.2%
s619
 
7.7%
h528
 
6.6%
528
 
6.6%
g495
 
6.2%
a434
 
5.4%
c408
 
5.1%
l393
 
4.9%
Other values (17)1678
20.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6658
83.1%
Space Separator528
 
6.6%
Uppercase Letter499
 
6.2%
Other Punctuation331
 
4.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1440
21.6%
o754
11.3%
r739
11.1%
s619
9.3%
h528
 
7.9%
g495
 
7.4%
a434
 
6.5%
c408
 
6.1%
l393
 
5.9%
d331
 
5.0%
Other values (8)517
 
7.8%
Uppercase Letter
ValueCountFrequency (%)
B222
44.5%
H153
30.7%
M87
 
17.4%
A22
 
4.4%
S7
 
1.4%
P4
 
0.8%
V4
 
0.8%
Space Separator
ValueCountFrequency (%)
528
100.0%
Other Punctuation
ValueCountFrequency (%)
'331
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7157
89.3%
Common859
 
10.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1440
20.1%
o754
10.5%
r739
10.3%
s619
8.6%
h528
 
7.4%
g495
 
6.9%
a434
 
6.1%
c408
 
5.7%
l393
 
5.5%
d331
 
4.6%
Other values (15)1016
14.2%
Common
ValueCountFrequency (%)
528
61.5%
'331
38.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII8016
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1440
18.0%
o754
9.4%
r739
9.2%
s619
 
7.7%
h528
 
6.6%
528
 
6.6%
g495
 
6.2%
a434
 
5.4%
c408
 
5.1%
l393
 
4.9%
Other values (17)1678
20.9%

politic_pref
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Very liberal
150 
Slightly liberal
126 
Slightly conservative
96 
Neutral/ Neither conservative or liberal
89 
Very conservative
35 

Length

Max length40
Median length21
Mean length20.11623246
Min length12

Characters and Unicode

Total characters10038
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSlightly liberal
2nd rowNeutral/ Neither conservative or liberal
3rd rowVery liberal
4th rowSlightly conservative
5th rowVery liberal

Common Values

ValueCountFrequency (%)
Very liberal150
30.1%
Slightly liberal126
25.3%
Slightly conservative96
19.2%
Neutral/ Neither conservative or liberal89
17.8%
Very conservative35
 
7.0%
Prefer not to say3
 
0.6%

Length

2022-11-16T16:18:13.222634image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:13.301529image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
liberal365
28.7%
slightly222
17.5%
conservative220
17.3%
very185
14.6%
neutral89
 
7.0%
neither89
 
7.0%
or89
 
7.0%
prefer3
 
0.2%
not3
 
0.2%
to3
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e1263
12.6%
l1263
12.6%
r1043
10.4%
i896
 
8.9%
772
 
7.7%
a677
 
6.7%
t626
 
6.2%
v440
 
4.4%
y410
 
4.1%
b365
 
3.6%
Other values (13)2283
22.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8589
85.6%
Space Separator772
 
7.7%
Uppercase Letter588
 
5.9%
Other Punctuation89
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1263
14.7%
l1263
14.7%
r1043
12.1%
i896
10.4%
a677
7.9%
t626
7.3%
v440
 
5.1%
y410
 
4.8%
b365
 
4.2%
o315
 
3.7%
Other values (7)1291
15.0%
Uppercase Letter
ValueCountFrequency (%)
S222
37.8%
V185
31.5%
N178
30.3%
P3
 
0.5%
Space Separator
ValueCountFrequency (%)
772
100.0%
Other Punctuation
ValueCountFrequency (%)
/89
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9177
91.4%
Common861
 
8.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1263
13.8%
l1263
13.8%
r1043
11.4%
i896
9.8%
a677
 
7.4%
t626
 
6.8%
v440
 
4.8%
y410
 
4.5%
b365
 
4.0%
o315
 
3.4%
Other values (11)1879
20.5%
Common
ValueCountFrequency (%)
772
89.7%
/89
 
10.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII10038
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1263
12.6%
l1263
12.6%
r1043
10.4%
i896
 
8.9%
772
 
7.7%
a677
 
6.7%
t626
 
6.2%
v440
 
4.4%
y410
 
4.1%
b365
 
3.6%
Other values (13)2283
22.7%

sm_res_purp
Categorical

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Moderately aware
128 
Very aware
119 
Slightly aware
117 
Not at all aware
76 
Extremely aware
59 

Length

Max length16
Median length15
Mean length13.98196393
Min length10

Characters and Unicode

Total characters6977
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowExtremely aware
2nd rowModerately aware
3rd rowExtremely aware
4th rowModerately aware
5th rowExtremely aware

Common Values

ValueCountFrequency (%)
Moderately aware128
25.7%
Very aware119
23.8%
Slightly aware117
23.4%
Not at all aware76
15.2%
Extremely aware59
11.8%

Length

2022-11-16T16:18:13.383634image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:13.461201image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
aware499
43.4%
moderately128
 
11.1%
very119
 
10.3%
slightly117
 
10.2%
not76
 
6.6%
at76
 
6.6%
all76
 
6.6%
extremely59
 
5.1%

Most occurring characters

ValueCountFrequency (%)
a1278
18.3%
e992
14.2%
r805
11.5%
651
9.3%
l573
8.2%
w499
 
7.2%
t456
 
6.5%
y423
 
6.1%
o204
 
2.9%
M128
 
1.8%
Other values (10)968
13.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5827
83.5%
Space Separator651
 
9.3%
Uppercase Letter499
 
7.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1278
21.9%
e992
17.0%
r805
13.8%
l573
9.8%
w499
 
8.6%
t456
 
7.8%
y423
 
7.3%
o204
 
3.5%
d128
 
2.2%
i117
 
2.0%
Other values (4)352
 
6.0%
Uppercase Letter
ValueCountFrequency (%)
M128
25.7%
V119
23.8%
S117
23.4%
N76
15.2%
E59
11.8%
Space Separator
ValueCountFrequency (%)
651
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6326
90.7%
Common651
 
9.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1278
20.2%
e992
15.7%
r805
12.7%
l573
9.1%
w499
 
7.9%
t456
 
7.2%
y423
 
6.7%
o204
 
3.2%
M128
 
2.0%
d128
 
2.0%
Other values (9)840
13.3%
Common
ValueCountFrequency (%)
651
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6977
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1278
18.3%
e992
14.2%
r805
11.5%
651
9.3%
l573
8.2%
w499
 
7.2%
t456
 
6.5%
y423
 
6.1%
o204
 
2.9%
M128
 
1.8%
Other values (10)968
13.9%

sm_aware
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct65
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect
154 
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect
47 
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are readily accessible to researchers and easy to collect
32 
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… are readily accessible to researchers and easy to collect
31 
… are large and can contain millions of data points,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect
 
18
Other values (60)
217 

Length

Max length547
Median length435
Mean length275.8496994
Min length17

Characters and Unicode

Total characters137649
Distinct characters32
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)5.2%

Sample

1st row… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys)
2nd row… are large and can contain millions of data points
3rd row… are large and can contain millions of data points,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect
4th row… are large and can contain millions of data points,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect
5th row… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect

Common Values

ValueCountFrequency (%)
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect154
30.9%
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect47
 
9.4%
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are readily accessible to researchers and easy to collect32
 
6.4%
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… are readily accessible to researchers and easy to collect31
 
6.2%
… are large and can contain millions of data points,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect18
 
3.6%
… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys)17
 
3.4%
… are large and can contain millions of data points,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect17
 
3.4%
… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect14
 
2.8%
… are large and can contain millions of data points,… are readily accessible to researchers and easy to collect12
 
2.4%
… are readily accessible to researchers and easy to collect10
 
2.0%
Other values (55)147
29.5%

Length

2022-11-16T16:18:13.560586image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and1188
 
5.8%
are1165
 
5.6%
to1122
 
5.4%
can764
 
3.7%
researchers709
 
3.4%
in667
 
3.2%
not645
 
3.1%
492
 
2.4%
of432
 
2.1%
accessible413
 
2.0%
Other values (62)13040
63.2%

Most occurring characters

ValueCountFrequency (%)
20138
14.6%
e14990
10.9%
t10814
 
7.9%
a10523
 
7.6%
r9133
 
6.6%
n8728
 
6.3%
i8217
 
6.0%
o8153
 
5.9%
s7674
 
5.6%
c6935
 
5.0%
Other values (22)32344
23.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter112427
81.7%
Space Separator20138
 
14.6%
Other Punctuation3976
 
2.9%
Dash Punctuation371
 
0.3%
Open Punctuation349
 
0.3%
Close Punctuation349
 
0.3%
Final Punctuation32
 
< 0.1%
Uppercase Letter7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e14990
13.3%
t10814
9.6%
a10523
9.4%
r9133
8.1%
n8728
7.8%
i8217
7.3%
o8153
 
7.3%
s7674
 
6.8%
c6935
 
6.2%
l6779
 
6.0%
Other values (13)20481
18.2%
Other Punctuation
ValueCountFrequency (%)
1885
47.4%
,1393
35.0%
.698
 
17.6%
Space Separator
ValueCountFrequency (%)
20138
100.0%
Dash Punctuation
ValueCountFrequency (%)
-371
100.0%
Open Punctuation
ValueCountFrequency (%)
(349
100.0%
Close Punctuation
ValueCountFrequency (%)
)349
100.0%
Final Punctuation
ValueCountFrequency (%)
32
100.0%
Uppercase Letter
ValueCountFrequency (%)
N7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin112434
81.7%
Common25215
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e14990
13.3%
t10814
9.6%
a10523
9.4%
r9133
8.1%
n8728
7.8%
i8217
7.3%
o8153
 
7.3%
s7674
 
6.8%
c6935
 
6.2%
l6779
 
6.0%
Other values (14)20488
18.2%
Common
ValueCountFrequency (%)
20138
79.9%
1885
 
7.5%
,1393
 
5.5%
.698
 
2.8%
-371
 
1.5%
(349
 
1.4%
)349
 
1.4%
32
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII135732
98.6%
Punctuation1917
 
1.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20138
14.8%
e14990
11.0%
t10814
 
8.0%
a10523
 
7.8%
r9133
 
6.7%
n8728
 
6.4%
i8217
 
6.1%
o8153
 
6.0%
s7674
 
5.7%
c6935
 
5.1%
Other values (20)30427
22.4%
Punctuation
ValueCountFrequency (%)
1885
98.3%
32
 
1.7%

sm_expmt_inerct
Categorical

HIGH CORRELATION

Distinct22
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
None of the above
69 
Privately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots")
68 
Creating fake accounts ("bots")
57 
Privately messaging users,Creating fake accounts ("bots")
50 
Privately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots"),Secretly changing the content of what users see
40 
Other values (17)
215 

Length

Max length170
Median length109
Mean length66.62925852
Min length17

Characters and Unicode

Total characters33248
Distinct characters32
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rowCreating fake accounts ("bots"),Secretly changing the content of what users see
2nd rowPrivately messaging users,Publicly posting on users' profiles,Secretly changing the content of what users see
3rd rowPrivately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots"),Secretly changing the content of what users see
4th rowCreating fake accounts ("bots")
5th rowPrivately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots")

Common Values

ValueCountFrequency (%)
None of the above69
13.8%
Privately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots")68
13.6%
Creating fake accounts ("bots")57
11.4%
Privately messaging users,Creating fake accounts ("bots")50
10.0%
Privately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots"),Secretly changing the content of what users see40
8.0%
Creating fake accounts ("bots"),Secretly changing the content of what users see37
7.4%
Privately messaging users,Publicly posting on users' profiles33
6.6%
Privately messaging users29
5.8%
Publicly posting on users' profiles,Creating fake accounts ("bots")23
 
4.6%
Privately messaging users,Creating fake accounts ("bots"),Secretly changing the content of what users see22
 
4.4%
Other values (12)71
14.2%

Length

2022-11-16T16:18:13.658316image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
users406
 
9.8%
accounts348
 
8.4%
fake331
 
8.0%
privately258
 
6.2%
messaging258
 
6.2%
the215
 
5.2%
of215
 
5.2%
posting214
 
5.2%
on214
 
5.2%
bots198
 
4.8%
Other values (20)1487
35.9%

Most occurring characters

ValueCountFrequency (%)
3645
 
11.0%
e3110
 
9.4%
s3039
 
9.1%
t2298
 
6.9%
n2052
 
6.2%
a1904
 
5.7%
o1837
 
5.5%
i1669
 
5.0%
r1584
 
4.8%
g1370
 
4.1%
Other values (22)10740
32.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter26477
79.6%
Space Separator3645
 
11.0%
Other Punctuation1429
 
4.3%
Uppercase Letter1035
 
3.1%
Close Punctuation331
 
1.0%
Open Punctuation331
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e3110
11.7%
s3039
11.5%
t2298
 
8.7%
n2052
 
7.8%
a1904
 
7.2%
o1837
 
6.9%
i1669
 
6.3%
r1584
 
6.0%
g1370
 
5.2%
c1365
 
5.2%
Other values (11)6249
23.6%
Uppercase Letter
ValueCountFrequency (%)
P472
45.6%
C331
32.0%
S146
 
14.1%
N69
 
6.7%
H17
 
1.6%
Other Punctuation
ValueCountFrequency (%)
"662
46.3%
,536
37.5%
'231
 
16.2%
Space Separator
ValueCountFrequency (%)
3645
100.0%
Close Punctuation
ValueCountFrequency (%)
)331
100.0%
Open Punctuation
ValueCountFrequency (%)
(331
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin27512
82.7%
Common5736
 
17.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3110
11.3%
s3039
11.0%
t2298
 
8.4%
n2052
 
7.5%
a1904
 
6.9%
o1837
 
6.7%
i1669
 
6.1%
r1584
 
5.8%
g1370
 
5.0%
c1365
 
5.0%
Other values (16)7284
26.5%
Common
ValueCountFrequency (%)
3645
63.5%
"662
 
11.5%
,536
 
9.3%
)331
 
5.8%
(331
 
5.8%
'231
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII33248
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3645
 
11.0%
e3110
 
9.4%
s3039
 
9.1%
t2298
 
6.9%
n2052
 
6.2%
a1904
 
5.7%
o1837
 
5.5%
i1669
 
5.0%
r1584
 
4.8%
g1370
 
4.1%
Other values (22)10740
32.3%

sm_data_use
Categorical

HIGH CARDINALITY

Distinct162
Distinct (%)32.5%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Political elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks
167 
Political elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks
 
24
Political elections (e.g. voting behavior),Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks
 
10
Political elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)
 
10
Political elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)
 
9
Other values (157)
279 

Length

Max length345
Median length295
Mean length259.8757515
Min length15

Characters and Unicode

Total characters129678
Distinct characters34
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique98 ?
Unique (%)19.6%

Sample

1st rowPolitical elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks
2nd rowPolitical elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks
3rd rowPolitical elections (e.g. voting behavior),Presidential approval ratings,Communication (e.g. spread of opinions and hate-speech),News consumption (e.g. sharing of misinformation),Social networks
4th rowPolitical elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)
5th rowPolitical elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks

Common Values

ValueCountFrequency (%)
Political elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks167
33.5%
Political elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks24
 
4.8%
Political elections (e.g. voting behavior),Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks10
 
2.0%
Political elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)10
 
2.0%
Political elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)9
 
1.8%
Political elections (e.g. voting behavior),Presidential approval ratings,Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks9
 
1.8%
Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks7
 
1.4%
Political elections (e.g. voting behavior),Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks7
 
1.4%
Communication (e.g. spread of opinions and hate-speech),News consumption (e.g. sharing of misinformation),Social networks6
 
1.2%
Political elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networks5
 
1.0%
Other values (152)245
49.1%

Length

2022-11-16T16:18:13.762849image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
e.g1982
 
16.1%
of1177
 
9.6%
spread753
 
6.1%
and729
 
5.9%
sharing419
 
3.4%
consumption419
 
3.4%
environment-related412
 
3.3%
sentiment412
 
3.3%
opinions404
 
3.3%
political398
 
3.2%
Other values (55)5197
42.2%

Most occurring characters

ValueCountFrequency (%)
11803
 
9.1%
e11790
 
9.1%
i10844
 
8.4%
n10671
 
8.2%
o10085
 
7.8%
s7814
 
6.0%
a7674
 
5.9%
t7150
 
5.5%
c5740
 
4.4%
r4872
 
3.8%
Other values (24)41235
31.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter102739
79.2%
Space Separator11803
 
9.1%
Other Punctuation6748
 
5.2%
Uppercase Letter3283
 
2.5%
Close Punctuation1982
 
1.5%
Open Punctuation1982
 
1.5%
Dash Punctuation1141
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e11790
11.5%
i10844
10.6%
n10671
10.4%
o10085
9.8%
s7814
 
7.6%
a7674
 
7.5%
t7150
 
7.0%
c5740
 
5.6%
r4872
 
4.7%
l4064
 
4.0%
Other values (11)22035
21.4%
Uppercase Letter
ValueCountFrequency (%)
P1148
35.0%
N424
 
12.9%
C404
 
12.3%
S371
 
11.3%
H349
 
10.6%
W325
 
9.9%
E262
 
8.0%
Other Punctuation
ValueCountFrequency (%)
.3964
58.7%
,2784
41.3%
Space Separator
ValueCountFrequency (%)
11803
100.0%
Close Punctuation
ValueCountFrequency (%)
)1982
100.0%
Open Punctuation
ValueCountFrequency (%)
(1982
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1141
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin106022
81.8%
Common23656
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e11790
11.1%
i10844
10.2%
n10671
10.1%
o10085
 
9.5%
s7814
 
7.4%
a7674
 
7.2%
t7150
 
6.7%
c5740
 
5.4%
r4872
 
4.6%
l4064
 
3.8%
Other values (18)25318
23.9%
Common
ValueCountFrequency (%)
11803
49.9%
.3964
 
16.8%
,2784
 
11.8%
)1982
 
8.4%
(1982
 
8.4%
-1141
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII129678
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11803
 
9.1%
e11790
 
9.1%
i10844
 
8.4%
n10671
 
8.2%
o10085
 
7.8%
s7814
 
6.0%
a7674
 
5.9%
t7150
 
5.5%
c5740
 
4.4%
r4872
 
3.8%
Other values (24)41235
31.8%

ethic_appr
Categorical

HIGH CARDINALITY
UNIFORM

Distinct498
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Ethics approval is needed for any research that involves human participants; their tissue and /or data to ensure that the dignity, rights, safety and well-being of all participants are the primary consideration of the research project.
 
2
The scope of the project and actions there in do not cross certain boundaries that may purposefully negatively affect participants as well as legal regulations and standard practices.
 
1
That they are going to use the information they receive appropriately. They are not going to manipulate and misuse what they gather.
 
1
It means that, in the opinion of the institution, the study and its methods are morally acceptable.
 
1
Ethical approval means getting clearance to obtain data from a research subject.
 
1
Other values (493)
493 

Length

Max length1026
Median length207
Mean length134.7935872
Min length15

Characters and Unicode

Total characters67262
Distinct characters66
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique497 ?
Unique (%)99.6%

Sample

1st rowThe scope of the project and actions there in do not cross certain boundaries that may purposefully negatively affect participants as well as legal regulations and standard practices.
2nd rowI think Ethical Approval means that the experiment is gathering data without harm or injury to people.
3rd rowResearchers focus on ethical standards towards those they gain data from. They need approval of their approach and receive methods.
4th rowI would think that using "ethical approval" means that the things others collect on social media sites would need to be honest and moral. Hopefully, there would be no under-handedness used in collecting information.
5th rowA set of rules of what to do and what to not do.

Common Values

ValueCountFrequency (%)
Ethics approval is needed for any research that involves human participants; their tissue and /or data to ensure that the dignity, rights, safety and well-being of all participants are the primary consideration of the research project.2
 
0.4%
The scope of the project and actions there in do not cross certain boundaries that may purposefully negatively affect participants as well as legal regulations and standard practices.1
 
0.2%
That they are going to use the information they receive appropriately. They are not going to manipulate and misuse what they gather.1
 
0.2%
It means that, in the opinion of the institution, the study and its methods are morally acceptable.1
 
0.2%
Ethical approval means getting clearance to obtain data from a research subject.1
 
0.2%
It means receiving approval from an IRB or other institution that has oversight over study approval. They make sure the studies to not hamr their subjects.1
 
0.2%
Ethical approval from the institution means they will act in a way that responsible and takes in to account the persons they are researching. 1
 
0.2%
Proof that the experiment is not done against people's wills and if people ask, all data will be deleted.1
 
0.2%
There is (or should be ) oversight from someone in charge, and who is ethical.1
 
0.2%
Morally correct.1
 
0.2%
Other values (488)488
97.8%

Length

2022-11-16T16:18:13.872419image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the736
 
6.5%
to474
 
4.2%
that434
 
3.8%
and295
 
2.6%
is249
 
2.2%
ethical249
 
2.2%
of227
 
2.0%
it217
 
1.9%
they212
 
1.9%
approval201
 
1.8%
Other values (1400)8064
71.0%

Most occurring characters

ValueCountFrequency (%)
11017
16.4%
e6612
 
9.8%
t6115
 
9.1%
a4947
 
7.4%
i4019
 
6.0%
o3823
 
5.7%
n3645
 
5.4%
s3457
 
5.1%
r3449
 
5.1%
h3300
 
4.9%
Other values (56)16878
25.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter54478
81.0%
Space Separator11017
 
16.4%
Other Punctuation966
 
1.4%
Uppercase Letter705
 
1.0%
Dash Punctuation36
 
0.1%
Open Punctuation22
 
< 0.1%
Close Punctuation22
 
< 0.1%
Final Punctuation8
 
< 0.1%
Control7
 
< 0.1%
Decimal Number1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6612
12.1%
t6115
11.2%
a4947
 
9.1%
i4019
 
7.4%
o3823
 
7.0%
n3645
 
6.7%
s3457
 
6.3%
r3449
 
6.3%
h3300
 
6.1%
l2057
 
3.8%
Other values (16)13054
24.0%
Uppercase Letter
ValueCountFrequency (%)
I257
36.5%
T144
20.4%
E91
 
12.9%
A46
 
6.5%
B24
 
3.4%
R23
 
3.3%
M19
 
2.7%
W15
 
2.1%
P9
 
1.3%
S9
 
1.3%
Other values (15)68
 
9.6%
Other Punctuation
ValueCountFrequency (%)
.535
55.4%
,234
24.2%
'141
 
14.6%
"28
 
2.9%
/12
 
1.2%
?10
 
1.0%
;4
 
0.4%
:2
 
0.2%
Space Separator
ValueCountFrequency (%)
11017
100.0%
Dash Punctuation
ValueCountFrequency (%)
-36
100.0%
Open Punctuation
ValueCountFrequency (%)
(22
100.0%
Close Punctuation
ValueCountFrequency (%)
)22
100.0%
Final Punctuation
ValueCountFrequency (%)
8
100.0%
Control
ValueCountFrequency (%)
7
100.0%
Decimal Number
ValueCountFrequency (%)
31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin55183
82.0%
Common12079
 
18.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e6612
12.0%
t6115
11.1%
a4947
 
9.0%
i4019
 
7.3%
o3823
 
6.9%
n3645
 
6.6%
s3457
 
6.3%
r3449
 
6.3%
h3300
 
6.0%
l2057
 
3.7%
Other values (41)13759
24.9%
Common
ValueCountFrequency (%)
11017
91.2%
.535
 
4.4%
,234
 
1.9%
'141
 
1.2%
-36
 
0.3%
"28
 
0.2%
(22
 
0.2%
)22
 
0.2%
/12
 
0.1%
?10
 
0.1%
Other values (5)22
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII67254
> 99.9%
Punctuation8
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11017
16.4%
e6612
 
9.8%
t6115
 
9.1%
a4947
 
7.4%
i4019
 
6.0%
o3823
 
5.7%
n3645
 
5.4%
s3457
 
5.1%
r3449
 
5.1%
h3300
 
4.9%
Other values (55)16870
25.1%
Punctuation
ValueCountFrequency (%)
8
100.0%

study_1_ethic_acc
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Completely acceptable
157 
Somewhat acceptable
144 
Somewhat unacceptable
81 
Neutral
62 
Completey unacceptable
55 

Length

Max length22
Median length21
Mean length18.79358717
Min length7

Characters and Unicode

Total characters9378
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNeutral
2nd rowCompletely acceptable
3rd rowCompletely acceptable
4th rowNeutral
5th rowCompletely acceptable

Common Values

ValueCountFrequency (%)
Completely acceptable157
31.5%
Somewhat acceptable144
28.9%
Somewhat unacceptable81
16.2%
Neutral62
 
12.4%
Completey unacceptable55
 
11.0%

Length

2022-11-16T16:18:13.958838image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:14.040131image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
acceptable301
32.2%
somewhat225
24.0%
completely157
16.8%
unacceptable136
14.5%
neutral62
 
6.6%
completey55
 
5.9%

Most occurring characters

ValueCountFrequency (%)
e1585
16.9%
a1161
12.4%
t936
10.0%
c874
9.3%
l868
9.3%
p649
6.9%
m437
 
4.7%
437
 
4.7%
o437
 
4.7%
b437
 
4.7%
Other values (9)1557
16.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8442
90.0%
Uppercase Letter499
 
5.3%
Space Separator437
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1585
18.8%
a1161
13.8%
t936
11.1%
c874
10.4%
l868
10.3%
p649
7.7%
m437
 
5.2%
o437
 
5.2%
b437
 
5.2%
w225
 
2.7%
Other values (5)833
9.9%
Uppercase Letter
ValueCountFrequency (%)
S225
45.1%
C212
42.5%
N62
 
12.4%
Space Separator
ValueCountFrequency (%)
437
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8941
95.3%
Common437
 
4.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1585
17.7%
a1161
13.0%
t936
10.5%
c874
9.8%
l868
9.7%
p649
7.3%
m437
 
4.9%
o437
 
4.9%
b437
 
4.9%
w225
 
2.5%
Other values (8)1332
14.9%
Common
ValueCountFrequency (%)
437
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9378
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1585
16.9%
a1161
12.4%
t936
10.0%
c874
9.3%
l868
9.3%
p649
6.9%
m437
 
4.7%
437
 
4.7%
o437
 
4.7%
b437
 
4.7%
Other values (9)1557
16.6%

study_1_conc
Categorical

HIGH CARDINALITY
MISSING

Distinct270
Distinct (%)91.5%
Missing204
Missing (%)40.9%
Memory size4.0 KiB
na
 
20
Na
 
7
Same as before, this is all public and anyone can do these things, so I have no issue with it.
 
1
No informed consent. Troll farms and the people that pay them should be illegal.
 
1
Again, I feel the users should be informed on the situation and what is occurring.
 
1
Other values (265)
265 

Length

Max length812
Median length198
Mean length117.9084746
Min length2

Characters and Unicode

Total characters34783
Distinct characters69
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique268 ?
Unique (%)90.8%

Sample

1st rowNo concerns. I would have loved to partake in this study in terms of watching the results.
2nd rowI feel if people know they are being judged they will act, speak, or write differently than if they don't know they are being analyzed.
3rd rowna
4th rowEasy enough for an outside government to try copying such a study with the sole purpose of creating much more polarization, hate, etc. Not that it hasn't been tried and tested perhaps innumerable times by all types of foreign or domestic entities as far as we know. No actual study would have really been needed to know that using a type of marketing manipulation could alter the recipients mood/levels of concern/anxiety/hate/etc.
5th rowThe participants were not Were of this research study being conducted. Therefore it is unethical

Common Values

ValueCountFrequency (%)
na20
 
4.0%
Na7
 
1.4%
Same as before, this is all public and anyone can do these things, so I have no issue with it.1
 
0.2%
No informed consent. Troll farms and the people that pay them should be illegal.1
 
0.2%
Again, I feel the users should be informed on the situation and what is occurring. 1
 
0.2%
My main concern remains that individuals were not giving informed consent. But, with the understanding that the posts are in the public domain - not just shared to friends - perhaps that is not really an ethical issue? I'm a bit torn here.1
 
0.2%
Even though users were not informed about being in this research, it ultimately was for a good cause and helpful in making progress towards less hate speech on social media.1
 
0.2%
I feel its quit acceptable posting something that reduces the hate in general also since it helps one to rethink their post.1
 
0.2%
In order for the study's results to be accurate users couldn't know that the researchers were running a study. 1
 
0.2%
Using people's data without consent for a study seems unethical.1
 
0.2%
Other values (260)260
52.1%
(Missing)204
40.9%

Length

2022-11-16T16:18:14.147096image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the297
 
4.9%
to158
 
2.6%
that157
 
2.6%
i154
 
2.5%
a141
 
2.3%
of134
 
2.2%
is125
 
2.0%
they121
 
2.0%
study108
 
1.8%
not106
 
1.7%
Other values (1071)4617
75.5%

Most occurring characters

ValueCountFrequency (%)
5913
17.0%
e3747
10.8%
t3055
 
8.8%
a2226
 
6.4%
i1925
 
5.5%
o1870
 
5.4%
s1834
 
5.3%
n1813
 
5.2%
h1582
 
4.5%
r1497
 
4.3%
Other values (59)9321
26.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter27561
79.2%
Space Separator5913
 
17.0%
Other Punctuation727
 
2.1%
Uppercase Letter524
 
1.5%
Control14
 
< 0.1%
Dash Punctuation13
 
< 0.1%
Decimal Number9
 
< 0.1%
Close Punctuation8
 
< 0.1%
Final Punctuation7
 
< 0.1%
Open Punctuation7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e3747
13.6%
t3055
11.1%
a2226
 
8.1%
i1925
 
7.0%
o1870
 
6.8%
s1834
 
6.7%
n1813
 
6.6%
h1582
 
5.7%
r1497
 
5.4%
d988
 
3.6%
Other values (16)7024
25.5%
Uppercase Letter
ValueCountFrequency (%)
I214
40.8%
T91
17.4%
A39
 
7.4%
P28
 
5.3%
N27
 
5.2%
S17
 
3.2%
W15
 
2.9%
M12
 
2.3%
E12
 
2.3%
H11
 
2.1%
Other values (12)58
 
11.1%
Other Punctuation
ValueCountFrequency (%)
.365
50.2%
,161
22.1%
'128
 
17.6%
"38
 
5.2%
?17
 
2.3%
/9
 
1.2%
!5
 
0.7%
:3
 
0.4%
;1
 
0.1%
Decimal Number
ValueCountFrequency (%)
05
55.6%
22
 
22.2%
11
 
11.1%
41
 
11.1%
Close Punctuation
ValueCountFrequency (%)
)6
75.0%
]2
 
25.0%
Open Punctuation
ValueCountFrequency (%)
(5
71.4%
[2
 
28.6%
Space Separator
ValueCountFrequency (%)
5913
100.0%
Control
ValueCountFrequency (%)
14
100.0%
Dash Punctuation
ValueCountFrequency (%)
-13
100.0%
Final Punctuation
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin28085
80.7%
Common6698
 
19.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3747
13.3%
t3055
10.9%
a2226
 
7.9%
i1925
 
6.9%
o1870
 
6.7%
s1834
 
6.5%
n1813
 
6.5%
h1582
 
5.6%
r1497
 
5.3%
d988
 
3.5%
Other values (38)7548
26.9%
Common
ValueCountFrequency (%)
5913
88.3%
.365
 
5.4%
,161
 
2.4%
'128
 
1.9%
"38
 
0.6%
?17
 
0.3%
14
 
0.2%
-13
 
0.2%
/9
 
0.1%
7
 
0.1%
Other values (11)33
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII34776
> 99.9%
Punctuation7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5913
17.0%
e3747
10.8%
t3055
 
8.8%
a2226
 
6.4%
i1925
 
5.5%
o1870
 
5.4%
s1834
 
5.3%
n1813
 
5.2%
h1582
 
4.5%
r1497
 
4.3%
Other values (58)9314
26.8%
Punctuation
ValueCountFrequency (%)
7
100.0%

study_1_add_info
Categorical

HIGH CARDINALITY
MISSING

Distinct107
Distinct (%)73.3%
Missing353
Missing (%)70.7%
Memory size4.0 KiB
na
30 
Na
 
8
no
 
2
N/a
 
2
NA.
 
2
Other values (102)
102 

Length

Max length316
Median length161.5
Mean length64.34246575
Min length2

Characters and Unicode

Total characters9394
Distinct characters59
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique102 ?
Unique (%)69.9%

Sample

1st rowna
2nd rowI would be interested to know what kind of messages they sent the hate speech users that got them to change their minds.
3rd rowFull disclosure of intent of researchers.
4th rowSee comments from previous studies
5th row Na

Common Values

ValueCountFrequency (%)
na30
 
6.0%
Na8
 
1.6%
no2
 
0.4%
N/a2
 
0.4%
NA.2
 
0.4%
The fact that the fake accounts were used to try and suppress hate speech, makes it more ethical in my opinion.1
 
0.2%
I know that I certainly wouldn't like to be part of an experiment without me person consenting to it.1
 
0.2%
I would love to see the message that was posted by the researches to what was posted in order to see what was said and how it was worded.1
 
0.2%
No one is getting harmed or misinformed here. the study is actually trying to help people, so i think it's acceptable even though they did not know they were in a study.1
 
0.2%
I would like to know more about what the replies actually said.1
 
0.2%
Other values (97)97
 
19.4%
(Missing)353
70.7%

Length

2022-11-16T16:18:14.258451image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the94
 
5.6%
to73
 
4.3%
i46
 
2.7%
na42
 
2.5%
it35
 
2.1%
they34
 
2.0%
of32
 
1.9%
would28
 
1.7%
were27
 
1.6%
was24
 
1.4%
Other values (489)1250
74.2%

Most occurring characters

ValueCountFrequency (%)
1573
16.7%
e1052
11.2%
t808
 
8.6%
a590
 
6.3%
o562
 
6.0%
s507
 
5.4%
n467
 
5.0%
i450
 
4.8%
h414
 
4.4%
r378
 
4.0%
Other values (49)2593
27.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7429
79.1%
Space Separator1573
 
16.7%
Other Punctuation194
 
2.1%
Uppercase Letter165
 
1.8%
Decimal Number12
 
0.1%
Dash Punctuation6
 
0.1%
Close Punctuation6
 
0.1%
Open Punctuation6
 
0.1%
Control2
 
< 0.1%
Final Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1052
14.2%
t808
10.9%
a590
 
7.9%
o562
 
7.6%
s507
 
6.8%
n467
 
6.3%
i450
 
6.1%
h414
 
5.6%
r378
 
5.1%
l282
 
3.8%
Other values (16)1919
25.8%
Uppercase Letter
ValueCountFrequency (%)
I70
42.4%
N24
 
14.5%
T13
 
7.9%
W12
 
7.3%
A10
 
6.1%
H7
 
4.2%
S5
 
3.0%
P5
 
3.0%
M3
 
1.8%
O3
 
1.8%
Other values (7)13
 
7.9%
Other Punctuation
ValueCountFrequency (%)
.105
54.1%
,44
22.7%
'24
 
12.4%
?14
 
7.2%
/4
 
2.1%
"2
 
1.0%
%1
 
0.5%
Decimal Number
ValueCountFrequency (%)
08
66.7%
13
 
25.0%
41
 
8.3%
Space Separator
ValueCountFrequency (%)
1573
100.0%
Dash Punctuation
ValueCountFrequency (%)
-6
100.0%
Close Punctuation
ValueCountFrequency (%)
)6
100.0%
Open Punctuation
ValueCountFrequency (%)
(6
100.0%
Control
ValueCountFrequency (%)
2
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7594
80.8%
Common1800
 
19.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1052
13.9%
t808
 
10.6%
a590
 
7.8%
o562
 
7.4%
s507
 
6.7%
n467
 
6.1%
i450
 
5.9%
h414
 
5.5%
r378
 
5.0%
l282
 
3.7%
Other values (33)2084
27.4%
Common
ValueCountFrequency (%)
1573
87.4%
.105
 
5.8%
,44
 
2.4%
'24
 
1.3%
?14
 
0.8%
08
 
0.4%
-6
 
0.3%
)6
 
0.3%
(6
 
0.3%
/4
 
0.2%
Other values (6)10
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII9393
> 99.9%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1573
16.7%
e1052
11.2%
t808
 
8.6%
a590
 
6.3%
o562
 
6.0%
s507
 
5.4%
n467
 
5.0%
i450
 
4.8%
h414
 
4.4%
r378
 
4.0%
Other values (48)2592
27.6%
Punctuation
ValueCountFrequency (%)
1
100.0%

study_2_ethic_acc
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Somewhat acceptable
133 
Completely acceptable
116 
Somewhat unacceptable
111 
Neutral
82 
Completely unacceptable
57 

Length

Max length23
Median length21
Mean length18.39478958
Min length7

Characters and Unicode

Total characters9179
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNeutral
2nd rowCompletely acceptable
3rd rowCompletely acceptable
4th rowSomewhat acceptable
5th rowCompletely acceptable

Common Values

ValueCountFrequency (%)
Somewhat acceptable133
26.7%
Completely acceptable116
23.2%
Somewhat unacceptable111
22.2%
Neutral82
16.4%
Completely unacceptable57
11.4%

Length

2022-11-16T16:18:14.348596image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:14.431129image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
acceptable249
27.2%
somewhat244
26.6%
completely173
18.9%
unacceptable168
18.3%
neutral82
 
9.0%

Most occurring characters

ValueCountFrequency (%)
e1506
16.4%
a1160
12.6%
t916
10.0%
l845
9.2%
c834
9.1%
p590
 
6.4%
b417
 
4.5%
m417
 
4.5%
417
 
4.5%
o417
 
4.5%
Other values (9)1660
18.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8263
90.0%
Uppercase Letter499
 
5.4%
Space Separator417
 
4.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1506
18.2%
a1160
14.0%
t916
11.1%
l845
10.2%
c834
10.1%
p590
 
7.1%
b417
 
5.0%
m417
 
5.0%
o417
 
5.0%
u250
 
3.0%
Other values (5)911
11.0%
Uppercase Letter
ValueCountFrequency (%)
S244
48.9%
C173
34.7%
N82
 
16.4%
Space Separator
ValueCountFrequency (%)
417
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8762
95.5%
Common417
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1506
17.2%
a1160
13.2%
t916
10.5%
l845
9.6%
c834
9.5%
p590
 
6.7%
b417
 
4.8%
m417
 
4.8%
o417
 
4.8%
u250
 
2.9%
Other values (8)1410
16.1%
Common
ValueCountFrequency (%)
417
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1506
16.4%
a1160
12.6%
t916
10.0%
l845
9.2%
c834
9.1%
p590
 
6.4%
b417
 
4.5%
m417
 
4.5%
417
 
4.5%
o417
 
4.5%
Other values (9)1660
18.1%

study_2_conc
Categorical

HIGH CARDINALITY
MISSING

Distinct283
Distinct (%)91.9%
Missing191
Missing (%)38.3%
Memory size4.0 KiB
na
 
19
Na
 
8
Slightly unethical to not alert participants they are part of a research study
 
1
I think it would have been more ethical if the researchers had simply commented on the public post rather than send an unsolicited private message.
 
1
They should not have private messaged the users
 
1
Other values (278)
278 

Length

Max length647
Median length187
Mean length113.0357143
Min length2

Characters and Unicode

Total characters34815
Distinct characters66
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique281 ?
Unique (%)91.2%

Sample

1st rowGoing to the poster privately provided opportunity for change without the possibly of increased toxicity from users. I prefer this method over commenting the "correct information".
2nd rowI feel as though, in the above case, users had a choice to respond or not so I think it was honest.
3rd rowna
4th rowIt's perfectly within someone's right to send someone else a message on any platform, therefore I believe this study was acceptable.
5th rowThis is unethical because those involved were not adequately informed of the researchers intent.

Common Values

ValueCountFrequency (%)
na19
 
3.8%
Na8
 
1.6%
Slightly unethical to not alert participants they are part of a research study1
 
0.2%
I think it would have been more ethical if the researchers had simply commented on the public post rather than send an unsolicited private message.1
 
0.2%
They should not have private messaged the users1
 
0.2%
Sending someone a private message does give me concern, that the recipient might have confused the sender for someone they know. However, the results were interesting to say the least.1
 
0.2%
My concern with this study is that the participants were not aware of the research study being conducted.1
 
0.2%
Participants should be informed they are part of a research project upfront. The use of bots to automate their process I disagree with.1
 
0.2%
Using bots on unaware citizens without their consent would seem to be unethical. Not to mention the famous case of the LinkedIn founder funding the troll farm "research" to manipulate the Alabama special election. Unethical if not criminal.1
 
0.2%
Not ethical at all to me. I totally find this is not ethical to have users information collected in this way. Especially being a responder on Prolific and seeing all of the verbiage that we have to read and agree to and learning about the things that researchers have to do to maintain a good pool of respondents, this does not make sense at all to me.1
 
0.2%
Other values (273)273
54.7%
(Missing)191
38.3%

Length

2022-11-16T16:18:14.532738image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the285
 
4.7%
to170
 
2.8%
i143
 
2.4%
a142
 
2.4%
of135
 
2.2%
they132
 
2.2%
that126
 
2.1%
not122
 
2.0%
study102
 
1.7%
is102
 
1.7%
Other values (1023)4560
75.8%

Most occurring characters

ValueCountFrequency (%)
5798
16.7%
e3664
 
10.5%
t2988
 
8.6%
a2259
 
6.5%
i2007
 
5.8%
n1963
 
5.6%
s1943
 
5.6%
o1924
 
5.5%
r1584
 
4.5%
h1471
 
4.2%
Other values (56)9214
26.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter27758
79.7%
Space Separator5798
 
16.7%
Other Punctuation693
 
2.0%
Uppercase Letter506
 
1.5%
Dash Punctuation32
 
0.1%
Close Punctuation9
 
< 0.1%
Open Punctuation8
 
< 0.1%
Control4
 
< 0.1%
Final Punctuation4
 
< 0.1%
Decimal Number3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e3664
13.2%
t2988
10.8%
a2259
 
8.1%
i2007
 
7.2%
n1963
 
7.1%
s1943
 
7.0%
o1924
 
6.9%
r1584
 
5.7%
h1471
 
5.3%
d992
 
3.6%
Other values (16)6963
25.1%
Uppercase Letter
ValueCountFrequency (%)
I198
39.1%
T104
20.6%
A36
 
7.1%
P31
 
6.1%
N20
 
4.0%
M15
 
3.0%
S15
 
3.0%
W14
 
2.8%
O11
 
2.2%
D10
 
2.0%
Other values (12)52
 
10.3%
Other Punctuation
ValueCountFrequency (%)
.353
50.9%
,142
20.5%
'132
 
19.0%
"34
 
4.9%
?21
 
3.0%
/6
 
0.9%
;3
 
0.4%
!1
 
0.1%
:1
 
0.1%
Decimal Number
ValueCountFrequency (%)
41
33.3%
21
33.3%
11
33.3%
Space Separator
ValueCountFrequency (%)
5798
100.0%
Dash Punctuation
ValueCountFrequency (%)
-32
100.0%
Close Punctuation
ValueCountFrequency (%)
)9
100.0%
Open Punctuation
ValueCountFrequency (%)
(8
100.0%
Control
ValueCountFrequency (%)
4
100.0%
Final Punctuation
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin28264
81.2%
Common6551
 
18.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3664
13.0%
t2988
10.6%
a2259
 
8.0%
i2007
 
7.1%
n1963
 
6.9%
s1943
 
6.9%
o1924
 
6.8%
r1584
 
5.6%
h1471
 
5.2%
d992
 
3.5%
Other values (38)7469
26.4%
Common
ValueCountFrequency (%)
5798
88.5%
.353
 
5.4%
,142
 
2.2%
'132
 
2.0%
"34
 
0.5%
-32
 
0.5%
?21
 
0.3%
)9
 
0.1%
(8
 
0.1%
/6
 
0.1%
Other values (8)16
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII34811
> 99.9%
Punctuation4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5798
16.7%
e3664
 
10.5%
t2988
 
8.6%
a2259
 
6.5%
i2007
 
5.8%
n1963
 
5.6%
s1943
 
5.6%
o1924
 
5.5%
r1584
 
4.6%
h1471
 
4.2%
Other values (55)9210
26.5%
Punctuation
ValueCountFrequency (%)
4
100.0%

study_2_add_info
Categorical

HIGH CARDINALITY
MISSING

Distinct109
Distinct (%)75.7%
Missing355
Missing (%)71.1%
Memory size4.0 KiB
na
29 
Na
 
8
Some people may not want to be contacted privately.
 
1
The researchers should have also tested liberal/progressive users who posted links to liberal websites that were also allegedly "untrustworthy."
 
1
The part about not telling people that they are part of a research study because misinformation is spread too much and now the participants will likely believe things that are untrue.
 
1
Other values (104)
104 

Length

Max length403
Median length195.5
Mean length67.75
Min length2

Characters and Unicode

Total characters9756
Distinct characters65
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)74.3%

Sample

1st rowna
2nd rowConcerns over the possibility of the researchers having their own political agenda. Yet fake news is a major problem. What social media really is when mass sharing news (political news), is simple propaganda from the left and right.
3rd rowFull disclosure if research intent
4th rowHow do you pick a representative sample on a non representative platform.
5th rowNz

Common Values

ValueCountFrequency (%)
na29
 
5.8%
Na8
 
1.6%
Some people may not want to be contacted privately. 1
 
0.2%
The researchers should have also tested liberal/progressive users who posted links to liberal websites that were also allegedly "untrustworthy."1
 
0.2%
The part about not telling people that they are part of a research study because misinformation is spread too much and now the participants will likely believe things that are untrue. 1
 
0.2%
Of course there is always the concern about funding and bias. I wonder if because of the private messaging whether there was any supplemental back-and-forth between the unwilling participant and someone behind the study which has potential for some form of abuse or corruption of data.1
 
0.2%
I'd like to know the exact messages they are sending.1
 
0.2%
you need people consent to do a study on them 1
 
0.2%
If I found out who labeled the misinformation(same people who labeled Hunter's laptop?) and if I found out that left leaning were also studied.1
 
0.2%
N/A/1
 
0.2%
Other values (99)99
 
19.8%
(Missing)355
71.1%

Length

2022-11-16T16:18:14.643203image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the102
 
5.9%
i47
 
2.7%
to40
 
2.3%
na39
 
2.3%
of36
 
2.1%
a36
 
2.1%
they36
 
2.1%
if34
 
2.0%
would31
 
1.8%
that30
 
1.7%
Other values (507)1287
74.9%

Most occurring characters

ValueCountFrequency (%)
1600
16.4%
e999
 
10.2%
t809
 
8.3%
a645
 
6.6%
o543
 
5.6%
i535
 
5.5%
s529
 
5.4%
n512
 
5.2%
r432
 
4.4%
h403
 
4.1%
Other values (55)2749
28.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7769
79.6%
Space Separator1600
 
16.4%
Uppercase Letter181
 
1.9%
Other Punctuation169
 
1.7%
Dash Punctuation10
 
0.1%
Open Punctuation8
 
0.1%
Close Punctuation8
 
0.1%
Decimal Number7
 
0.1%
Control3
 
< 0.1%
Final Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e999
12.9%
t809
 
10.4%
a645
 
8.3%
o543
 
7.0%
i535
 
6.9%
s529
 
6.8%
n512
 
6.6%
r432
 
5.6%
h403
 
5.2%
l330
 
4.2%
Other values (16)2032
26.2%
Uppercase Letter
ValueCountFrequency (%)
I72
39.8%
T20
 
11.0%
N20
 
11.0%
A10
 
5.5%
W9
 
5.0%
H7
 
3.9%
S6
 
3.3%
D5
 
2.8%
R4
 
2.2%
P4
 
2.2%
Other values (10)24
 
13.3%
Other Punctuation
ValueCountFrequency (%)
.87
51.5%
,32
 
18.9%
'19
 
11.2%
?11
 
6.5%
/10
 
5.9%
"8
 
4.7%
!1
 
0.6%
;1
 
0.6%
Decimal Number
ValueCountFrequency (%)
04
57.1%
12
28.6%
21
 
14.3%
Open Punctuation
ValueCountFrequency (%)
(7
87.5%
[1
 
12.5%
Close Punctuation
ValueCountFrequency (%)
)7
87.5%
]1
 
12.5%
Space Separator
ValueCountFrequency (%)
1600
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10
100.0%
Control
ValueCountFrequency (%)
3
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7950
81.5%
Common1806
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e999
12.6%
t809
 
10.2%
a645
 
8.1%
o543
 
6.8%
i535
 
6.7%
s529
 
6.7%
n512
 
6.4%
r432
 
5.4%
h403
 
5.1%
l330
 
4.2%
Other values (36)2213
27.8%
Common
ValueCountFrequency (%)
1600
88.6%
.87
 
4.8%
,32
 
1.8%
'19
 
1.1%
?11
 
0.6%
-10
 
0.6%
/10
 
0.6%
"8
 
0.4%
(7
 
0.4%
)7
 
0.4%
Other values (9)15
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII9755
> 99.9%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1600
16.4%
e999
 
10.2%
t809
 
8.3%
a645
 
6.6%
o543
 
5.6%
i535
 
5.5%
s529
 
5.4%
n512
 
5.2%
r432
 
4.4%
h403
 
4.1%
Other values (54)2748
28.2%
Punctuation
ValueCountFrequency (%)
1
100.0%

study_3_ethic_acc
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Completely accepatable
249 
Somewhat acceptable
134 
Neutral
56 
Somewhat unacceptable
39 
Completely unacceptable
 
21

Length

Max length23
Median length22
Mean length19.4749499
Min length7

Characters and Unicode

Total characters9718
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNeutral
2nd rowCompletely accepatable
3rd rowSomewhat acceptable
4th rowSomewhat acceptable
5th rowCompletely unacceptable

Common Values

ValueCountFrequency (%)
Completely accepatable249
49.9%
Somewhat acceptable134
26.9%
Neutral56
 
11.2%
Somewhat unacceptable39
 
7.8%
Completely unacceptable21
 
4.2%

Length

2022-11-16T16:18:14.734669image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:14.813569image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
completely270
28.7%
accepatable249
26.4%
somewhat173
18.4%
acceptable134
14.2%
unacceptable60
 
6.4%
neutral56
 
5.9%

Most occurring characters

ValueCountFrequency (%)
e1655
17.0%
a1364
14.0%
l1039
10.7%
t942
9.7%
c886
9.1%
p713
7.3%
m443
 
4.6%
b443
 
4.6%
443
 
4.6%
o443
 
4.6%
Other values (9)1347
13.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8776
90.3%
Uppercase Letter499
 
5.1%
Space Separator443
 
4.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1655
18.9%
a1364
15.5%
l1039
11.8%
t942
10.7%
c886
10.1%
p713
8.1%
m443
 
5.0%
b443
 
5.0%
o443
 
5.0%
y270
 
3.1%
Other values (5)578
 
6.6%
Uppercase Letter
ValueCountFrequency (%)
C270
54.1%
S173
34.7%
N56
 
11.2%
Space Separator
ValueCountFrequency (%)
443
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9275
95.4%
Common443
 
4.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1655
17.8%
a1364
14.7%
l1039
11.2%
t942
10.2%
c886
9.6%
p713
7.7%
m443
 
4.8%
b443
 
4.8%
o443
 
4.8%
C270
 
2.9%
Other values (8)1077
11.6%
Common
ValueCountFrequency (%)
443
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9718
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1655
17.0%
a1364
14.0%
l1039
10.7%
t942
9.7%
c886
9.1%
p713
7.3%
m443
 
4.6%
b443
 
4.6%
443
 
4.6%
o443
 
4.6%
Other values (9)1347
13.9%

study_3_conc
Categorical

HIGH CARDINALITY
MISSING

Distinct247
Distinct (%)90.1%
Missing225
Missing (%)45.1%
Memory size4.0 KiB
na
 
21
Na
 
8
This is more acceptable because participants are informed ahead of time about the use of their data.
 
1
Bribery to sway someone's oppinion is not very ethical. Worse the research study has no value, because the results of the data was tainted or biased.
 
1
This type of information gathering requires the Twitter user to agree to their data being used, so it is not unlike any other online study, so this is okay with me as far as ethics. By users agreeing to their data being used, my thought is it probably rules out all of the bots that are so prevalent on Twitter.
 
1
Other values (242)
242 

Length

Max length732
Median length154
Mean length107.7591241
Min length1

Characters and Unicode

Total characters29526
Distinct characters74
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique245 ?
Unique (%)89.4%

Sample

1st rowI find this is ethical as long as participants were fully aware of what was being monitored. The results are interesting! No concerns.
2nd rowAs long as the Facebook users were informed that they would be in a study I feel it is fair. It was up to the users whether they wanted to participate or not. Also, they were encouraged, but not actually made to Like the Facebook study.
3rd rowThe web extension being used was invasive, even if it was used with consent. The people participating in the study are not educated enough on exactly how much information the web extension was taking.
4th rowna
5th rowThe researchers seem in some ways to try manipulating political viewpoints in a segment of the population for the sake of science.

Common Values

ValueCountFrequency (%)
na21
 
4.2%
Na8
 
1.6%
This is more acceptable because participants are informed ahead of time about the use of their data.1
 
0.2%
Bribery to sway someone's oppinion is not very ethical. Worse the research study has no value, because the results of the data was tainted or biased.1
 
0.2%
This type of information gathering requires the Twitter user to agree to their data being used, so it is not unlike any other online study, so this is okay with me as far as ethics. By users agreeing to their data being used, my thought is it probably rules out all of the bots that are so prevalent on Twitter.1
 
0.2%
The study was completely transparent.1
 
0.2%
Most importantly, the researchers got the approval of their users first. At least the users were aware that they were taking part in a study even though, in my mind, they were severely underpaid.1
 
0.2%
Researchers were up front and honest and people got to pick their reward. 1
 
0.2%
I feel that since all the participants were informed of the process and offered compensation and they willingly participated, there are not any unethical practices used in the process.1
 
0.2%
My only concern is that the browser extension allowed researches to see ALL of the users' posts. I think this would be okay if it was explicitly consented to by the participants, though the above doesn't specify.1
 
0.2%
Other values (237)237
47.5%
(Missing)225
45.1%

Length

2022-11-16T16:18:14.912077image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the336
 
6.5%
to164
 
3.2%
i130
 
2.5%
of120
 
2.3%
they106
 
2.1%
were105
 
2.0%
a98
 
1.9%
and98
 
1.9%
that96
 
1.9%
study93
 
1.8%
Other values (902)3791
73.8%

Most occurring characters

ValueCountFrequency (%)
4940
16.7%
e3210
10.9%
t2556
 
8.7%
a1844
 
6.2%
i1723
 
5.8%
o1663
 
5.6%
s1589
 
5.4%
n1567
 
5.3%
r1354
 
4.6%
h1214
 
4.1%
Other values (64)7866
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter23582
79.9%
Space Separator4940
 
16.7%
Other Punctuation520
 
1.8%
Uppercase Letter434
 
1.5%
Decimal Number20
 
0.1%
Dash Punctuation10
 
< 0.1%
Currency Symbol8
 
< 0.1%
Close Punctuation5
 
< 0.1%
Open Punctuation5
 
< 0.1%
Control1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e3210
13.6%
t2556
10.8%
a1844
 
7.8%
i1723
 
7.3%
o1663
 
7.1%
s1589
 
6.7%
n1567
 
6.6%
r1354
 
5.7%
h1214
 
5.1%
l841
 
3.6%
Other values (16)6021
25.5%
Uppercase Letter
ValueCountFrequency (%)
I161
37.1%
T103
23.7%
A33
 
7.6%
N16
 
3.7%
P15
 
3.5%
S14
 
3.2%
E11
 
2.5%
F11
 
2.5%
B9
 
2.1%
L8
 
1.8%
Other values (14)53
 
12.2%
Other Punctuation
ValueCountFrequency (%)
.301
57.9%
,127
24.4%
'62
 
11.9%
"12
 
2.3%
?10
 
1.9%
/3
 
0.6%
!2
 
0.4%
;1
 
0.2%
:1
 
0.2%
%1
 
0.2%
Decimal Number
ValueCountFrequency (%)
06
30.0%
56
30.0%
84
20.0%
32
 
10.0%
21
 
5.0%
11
 
5.0%
Dash Punctuation
ValueCountFrequency (%)
-8
80.0%
2
 
20.0%
Space Separator
ValueCountFrequency (%)
4940
100.0%
Currency Symbol
ValueCountFrequency (%)
$8
100.0%
Close Punctuation
ValueCountFrequency (%)
)5
100.0%
Open Punctuation
ValueCountFrequency (%)
(5
100.0%
Control
ValueCountFrequency (%)
1
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin24016
81.3%
Common5510
 
18.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3210
13.4%
t2556
 
10.6%
a1844
 
7.7%
i1723
 
7.2%
o1663
 
6.9%
s1589
 
6.6%
n1567
 
6.5%
r1354
 
5.6%
h1214
 
5.1%
l841
 
3.5%
Other values (40)6455
26.9%
Common
ValueCountFrequency (%)
4940
89.7%
.301
 
5.5%
,127
 
2.3%
'62
 
1.1%
"12
 
0.2%
?10
 
0.2%
-8
 
0.1%
$8
 
0.1%
06
 
0.1%
56
 
0.1%
Other values (14)30
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII29523
> 99.9%
Punctuation3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4940
16.7%
e3210
10.9%
t2556
 
8.7%
a1844
 
6.2%
i1723
 
5.8%
o1663
 
5.6%
s1589
 
5.4%
n1567
 
5.3%
r1354
 
4.6%
h1214
 
4.1%
Other values (62)7863
26.6%
Punctuation
ValueCountFrequency (%)
2
66.7%
1
33.3%

study_3_add_info
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct92
Distinct (%)71.9%
Missing371
Missing (%)74.3%
Memory size4.0 KiB
na
25 
Na
11 
no
 
2
NA.
 
2
I'd like to know how they can make sure the browser extension only stays on peoples browsers for 8 weeks. Does the user have to remove it themselves? Some people are not so great with technology and wouldn't be able to figure it out and thus the researchers could collect far more data than promised.
 
1
Other values (87)
87 

Length

Max length300
Median length186
Mean length70.265625
Min length2

Characters and Unicode

Total characters8994
Distinct characters61
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique88 ?
Unique (%)68.8%

Sample

1st rowMaking the source code for the web extension publicly available to have complete transparency over what the extension was doing.
2nd rowna
3rd rowSince the study is revealed as a study, I think it’s ethical, but mostly nonsensical
4th rowNa
5th rowNa

Common Values

ValueCountFrequency (%)
na25
 
5.0%
Na11
 
2.2%
no2
 
0.4%
NA.2
 
0.4%
I'd like to know how they can make sure the browser extension only stays on peoples browsers for 8 weeks. Does the user have to remove it themselves? Some people are not so great with technology and wouldn't be able to figure it out and thus the researchers could collect far more data than promised.1
 
0.2%
I would not click on the ad1
 
0.2%
I think the information is going to be skewed based upon what the user thinks the researcher is looking for. They are also more likely to click on political sites because they want to make sure that the researcher is gathering enough data from them.1
 
0.2%
What was collected by the extension would be valuable.1
 
0.2%
So many ways for people to get manipulated with these studies I agree with this one being one of the good ones.1
 
0.2%
Everyone was informed, so i think this is a good study. 1
 
0.2%
Other values (82)82
 
16.4%
(Missing)371
74.3%

Length

2022-11-16T16:18:15.020583image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the125
 
7.7%
to48
 
3.0%
i42
 
2.6%
na39
 
2.4%
would38
 
2.3%
of30
 
1.8%
that28
 
1.7%
was26
 
1.6%
it24
 
1.5%
study24
 
1.5%
Other values (469)1198
73.9%

Most occurring characters

ValueCountFrequency (%)
1509
16.8%
e962
 
10.7%
t770
 
8.6%
a600
 
6.7%
o562
 
6.2%
s458
 
5.1%
n450
 
5.0%
i441
 
4.9%
r392
 
4.4%
h358
 
4.0%
Other values (51)2492
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7170
79.7%
Space Separator1509
 
16.8%
Other Punctuation146
 
1.6%
Uppercase Letter138
 
1.5%
Decimal Number10
 
0.1%
Close Punctuation5
 
0.1%
Open Punctuation5
 
0.1%
Dash Punctuation4
 
< 0.1%
Control3
 
< 0.1%
Currency Symbol3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e962
13.4%
t770
10.7%
a600
 
8.4%
o562
 
7.8%
s458
 
6.4%
n450
 
6.3%
i441
 
6.2%
r392
 
5.5%
h358
 
5.0%
l291
 
4.1%
Other values (15)1886
26.3%
Uppercase Letter
ValueCountFrequency (%)
I57
41.3%
N19
 
13.8%
T13
 
9.4%
A11
 
8.0%
W7
 
5.1%
S5
 
3.6%
D4
 
2.9%
M4
 
2.9%
H3
 
2.2%
E3
 
2.2%
Other values (8)12
 
8.7%
Other Punctuation
ValueCountFrequency (%)
.83
56.8%
,34
23.3%
'21
 
14.4%
?5
 
3.4%
!1
 
0.7%
&1
 
0.7%
/1
 
0.7%
Decimal Number
ValueCountFrequency (%)
55
50.0%
83
30.0%
11
 
10.0%
01
 
10.0%
Space Separator
ValueCountFrequency (%)
1509
100.0%
Close Punctuation
ValueCountFrequency (%)
)5
100.0%
Open Punctuation
ValueCountFrequency (%)
(5
100.0%
Dash Punctuation
ValueCountFrequency (%)
-4
100.0%
Control
ValueCountFrequency (%)
3
100.0%
Currency Symbol
ValueCountFrequency (%)
$3
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7308
81.3%
Common1686
 
18.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e962
13.2%
t770
 
10.5%
a600
 
8.2%
o562
 
7.7%
s458
 
6.3%
n450
 
6.2%
i441
 
6.0%
r392
 
5.4%
h358
 
4.9%
l291
 
4.0%
Other values (33)2024
27.7%
Common
ValueCountFrequency (%)
1509
89.5%
.83
 
4.9%
,34
 
2.0%
'21
 
1.2%
55
 
0.3%
)5
 
0.3%
(5
 
0.3%
?5
 
0.3%
-4
 
0.2%
3
 
0.2%
Other values (8)12
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII8993
> 99.9%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1509
16.8%
e962
 
10.7%
t770
 
8.6%
a600
 
6.7%
o562
 
6.2%
s458
 
5.1%
n450
 
5.0%
i441
 
4.9%
r392
 
4.4%
h358
 
4.0%
Other values (50)2491
27.7%
Punctuation
ValueCountFrequency (%)
1
100.0%

study_4_ethic_acc
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Somewhat acceptable
121 
Completely acceptable
115 
Somewhat unacceptable
110 
Neutral
88 
Completely unacceptable
65 

Length

Max length23
Median length21
Mean length18.30661323
Min length7

Characters and Unicode

Total characters9135
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNeutral
2nd rowNeutral
3rd rowSomewhat unacceptable
4th rowSomewhat unacceptable
5th rowCompletely acceptable

Common Values

ValueCountFrequency (%)
Somewhat acceptable121
24.2%
Completely acceptable115
23.0%
Somewhat unacceptable110
22.0%
Neutral88
17.6%
Completely unacceptable65
13.0%

Length

2022-11-16T16:18:15.110500image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:15.194206image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
acceptable236
25.9%
somewhat231
25.4%
completely180
19.8%
unacceptable175
19.2%
neutral88
 
9.7%

Most occurring characters

ValueCountFrequency (%)
e1501
16.4%
a1141
12.5%
t910
10.0%
l859
9.4%
c822
9.0%
p591
 
6.5%
b411
 
4.5%
m411
 
4.5%
411
 
4.5%
o411
 
4.5%
Other values (9)1667
18.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8225
90.0%
Uppercase Letter499
 
5.5%
Space Separator411
 
4.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1501
18.2%
a1141
13.9%
t910
11.1%
l859
10.4%
c822
10.0%
p591
 
7.2%
b411
 
5.0%
m411
 
5.0%
o411
 
5.0%
u263
 
3.2%
Other values (5)905
11.0%
Uppercase Letter
ValueCountFrequency (%)
S231
46.3%
C180
36.1%
N88
 
17.6%
Space Separator
ValueCountFrequency (%)
411
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8724
95.5%
Common411
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1501
17.2%
a1141
13.1%
t910
10.4%
l859
9.8%
c822
9.4%
p591
 
6.8%
b411
 
4.7%
m411
 
4.7%
o411
 
4.7%
u263
 
3.0%
Other values (8)1404
16.1%
Common
ValueCountFrequency (%)
411
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9135
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1501
16.4%
a1141
12.5%
t910
10.0%
l859
9.4%
c822
9.0%
p591
 
6.5%
b411
 
4.5%
m411
 
4.5%
411
 
4.5%
o411
 
4.5%
Other values (9)1667
18.2%

study_4_conc
Categorical

HIGH CARDINALITY
MISSING

Distinct267
Distinct (%)90.2%
Missing203
Missing (%)40.7%
Memory size4.0 KiB
na
 
24
Na
 
7
Since this was done in a public setting, I feel like it is more ethically acceptable than the other study that was similar, but done through private messages.
 
1
I think any time you are purposely deceiving people and not informing them about what the true reasoning/outliner is, there is an ethical dilemma.
 
1
Wow, just goes to show that social media feeds on itself. A public message is determined immediately to be hostile - both to the original poster and their followers who maintain similar opinions. I have seen this play out on facebook where the language gets so violent and inappropriate, I have snoozed people. This happened rather frequently with covid responses and supporters of T's big lie.
 
1
Other values (262)
262 

Length

Max length822
Median length222
Mean length122.1756757
Min length2

Characters and Unicode

Total characters36164
Distinct characters77
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique265 ?
Unique (%)89.5%

Sample

1st rowI am uncertain how I feel completely about a researcher creating a fake account. However I do understand the desire to protect themselves and to not give away their actions as being part of a study. This misinformation needed to be corrected for the public but it opened the original poster to toxicity. The OP may not have known it was incorrect.
2nd rowUsers were not aware of what was going on so they were possibly more honest in their opinions because they had no idea they were being analyzed.
3rd rowna
4th rowMany of the people that have large political followings on twitter (and many who don't) often know already the news they are sharing is fake. It's political partisanship and the spreading of propaganda. Some might post fake news only to gain more followers (the masses) if they believe it serves that end.
5th rowThis study is unethfull disclosure of intent of research.ical because they were not informed of the research study

Common Values

ValueCountFrequency (%)
na24
 
4.8%
Na7
 
1.4%
Since this was done in a public setting, I feel like it is more ethically acceptable than the other study that was similar, but done through private messages.1
 
0.2%
I think any time you are purposely deceiving people and not informing them about what the true reasoning/outliner is, there is an ethical dilemma. 1
 
0.2%
Wow, just goes to show that social media feeds on itself. A public message is determined immediately to be hostile - both to the original poster and their followers who maintain similar opinions. I have seen this play out on facebook where the language gets so violent and inappropriate, I have snoozed people. This happened rather frequently with covid responses and supporters of T's big lie.1
 
0.2%
I have found that some "fact-checking" sites end up having wrong information as well. A great case is how the idea that masks don't work spread. Many studies have been conducted and some very reputable scientific organizations have studies on their websites that say masks do not work, but the data is typically very small or the type of mask used was a very thin cloth mask, but it gives fuel to people who spread the false information that masking does not work. 1
 
0.2%
I feel like this is acceptable because when you sign up for social media, if you are posting something publicly it is assumed that anyone can look at these posts and reply to them1
 
0.2%
This is all public, so I have no issue with the researchers observing this.1
 
0.2%
I personally have no issue with this study, though objectively it seems a little dubious to study individuals without their knowledge like this.1
 
0.2%
I object to most studies in which users are not informed that they are being studied and that they are being manipulated. Secondly, I would like to see the type of reply that was originally sent. This study almost directly contradicts the results reached in the last study in which people deleted their hate speech because they got a link to a fact checking site along with an empathetic response. Lastly, I now do my own fact checking after learning that quite a few of these fact checkers are deliberately manipulating and distorting info because of their own bias. I simply no longer trust the "fact checkers."1
 
0.2%
Other values (257)257
51.5%
(Missing)203
40.7%

Length

2022-11-16T16:18:15.293840image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the302
 
4.8%
i156
 
2.5%
to152
 
2.4%
that148
 
2.4%
of148
 
2.4%
they137
 
2.2%
a137
 
2.2%
is121
 
1.9%
and112
 
1.8%
not111
 
1.8%
Other values (1094)4764
75.8%

Most occurring characters

ValueCountFrequency (%)
6069
16.8%
e3621
 
10.0%
t3134
 
8.7%
a2366
 
6.5%
i2149
 
5.9%
n2056
 
5.7%
o2034
 
5.6%
s1859
 
5.1%
r1541
 
4.3%
h1495
 
4.1%
Other values (67)9840
27.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter28746
79.5%
Space Separator6069
 
16.8%
Other Punctuation721
 
2.0%
Uppercase Letter516
 
1.4%
Dash Punctuation34
 
0.1%
Decimal Number34
 
0.1%
Open Punctuation10
 
< 0.1%
Control9
 
< 0.1%
Close Punctuation9
 
< 0.1%
Final Punctuation6
 
< 0.1%
Other values (3)10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e3621
12.6%
t3134
10.9%
a2366
 
8.2%
i2149
 
7.5%
n2056
 
7.2%
o2034
 
7.1%
s1859
 
6.5%
r1541
 
5.4%
h1495
 
5.2%
d1009
 
3.5%
Other values (16)7482
26.0%
Uppercase Letter
ValueCountFrequency (%)
I202
39.1%
T112
21.7%
A32
 
6.2%
P23
 
4.5%
S22
 
4.3%
N18
 
3.5%
M14
 
2.7%
W13
 
2.5%
O11
 
2.1%
D8
 
1.6%
Other values (12)61
 
11.8%
Other Punctuation
ValueCountFrequency (%)
.352
48.8%
,176
24.4%
'107
 
14.8%
"46
 
6.4%
?13
 
1.8%
/11
 
1.5%
%9
 
1.2%
&3
 
0.4%
:2
 
0.3%
;1
 
0.1%
Decimal Number
ValueCountFrequency (%)
213
38.2%
012
35.3%
72
 
5.9%
32
 
5.9%
52
 
5.9%
41
 
2.9%
91
 
2.9%
11
 
2.9%
Math Symbol
ValueCountFrequency (%)
=4
80.0%
~1
 
20.0%
Space Separator
ValueCountFrequency (%)
6069
100.0%
Dash Punctuation
ValueCountFrequency (%)
-34
100.0%
Open Punctuation
ValueCountFrequency (%)
(10
100.0%
Control
ValueCountFrequency (%)
9
100.0%
Close Punctuation
ValueCountFrequency (%)
)9
100.0%
Final Punctuation
ValueCountFrequency (%)
6
100.0%
Connector Punctuation
ValueCountFrequency (%)
_4
100.0%
Initial Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin29262
80.9%
Common6902
 
19.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3621
12.4%
t3134
 
10.7%
a2366
 
8.1%
i2149
 
7.3%
n2056
 
7.0%
o2034
 
7.0%
s1859
 
6.4%
r1541
 
5.3%
h1495
 
5.1%
d1009
 
3.4%
Other values (38)7998
27.3%
Common
ValueCountFrequency (%)
6069
87.9%
.352
 
5.1%
,176
 
2.5%
'107
 
1.6%
"46
 
0.7%
-34
 
0.5%
?13
 
0.2%
213
 
0.2%
012
 
0.2%
/11
 
0.2%
Other values (19)69
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII36157
> 99.9%
Punctuation7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6069
16.8%
e3621
 
10.0%
t3134
 
8.7%
a2366
 
6.5%
i2149
 
5.9%
n2056
 
5.7%
o2034
 
5.6%
s1859
 
5.1%
r1541
 
4.3%
h1495
 
4.1%
Other values (65)9833
27.2%
Punctuation
ValueCountFrequency (%)
6
85.7%
1
 
14.3%

study_4_add_info
Categorical

HIGH CARDINALITY
MISSING

Distinct104
Distinct (%)72.2%
Missing355
Missing (%)71.1%
Memory size4.0 KiB
na
32 
Na
 
8
no
 
3
I don't think it's acceptable because non human bots were used and even though they linked to a fact checking website, they are still influencing people and that will cause more divisiveness.
 
1
The researchers relied on "fact checkers" to determine if the information was "fake" or not. If there existed a bias or margin of error in the fact checker's process then the researchers would be working with wrong information themselves.
 
1
Other values (99)
99 

Length

Max length580
Median length189
Mean length71.40972222
Min length2

Characters and Unicode

Total characters10283
Distinct characters60
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique101 ?
Unique (%)70.1%

Sample

1st rowThe researchers had a purpose in seeing the responses of those interacting with the post. I do not agree with how it was done entirely however I do not know a better way to get the results that were desired.
2nd rowna
3rd rowFull disclosure of intent of researchers.
4th rowPlease see comments from the first study
5th rowNa

Common Values

ValueCountFrequency (%)
na32
 
6.4%
Na8
 
1.6%
no3
 
0.6%
I don't think it's acceptable because non human bots were used and even though they linked to a fact checking website, they are still influencing people and that will cause more divisiveness. 1
 
0.2%
The researchers relied on "fact checkers" to determine if the information was "fake" or not. If there existed a bias or margin of error in the fact checker's process then the researchers would be working with wrong information themselves.1
 
0.2%
I find it acceptable because this method is helping to slow the spread of misinformation1
 
0.2%
The outcome of this study has also resulted in negative behavior by the participants due to the experiment.1
 
0.2%
Completely unethical. Customers were unaware of the study and fake accounts were made. No one was compensated or aware.1
 
0.2%
I would like to know example content of the tweets and be shown an example of the bot accounts - it's hard to know exactly how individuals would react to someone pointing out their wrong without knowing the profile of the person pointing out the error.1
 
0.2%
None.1
 
0.2%
Other values (94)94
 
18.8%
(Missing)355
71.1%

Length

2022-11-16T16:18:15.399699image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the98
 
5.3%
to60
 
3.2%
i44
 
2.4%
na43
 
2.3%
of37
 
2.0%
it36
 
1.9%
a33
 
1.8%
if32
 
1.7%
they31
 
1.7%
that29
 
1.6%
Other values (534)1422
76.2%

Most occurring characters

ValueCountFrequency (%)
1743
17.0%
e1097
 
10.7%
t899
 
8.7%
a650
 
6.3%
o585
 
5.7%
n525
 
5.1%
i524
 
5.1%
s524
 
5.1%
h449
 
4.4%
r438
 
4.3%
Other values (50)2849
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8164
79.4%
Space Separator1743
 
17.0%
Other Punctuation181
 
1.8%
Uppercase Letter171
 
1.7%
Dash Punctuation10
 
0.1%
Close Punctuation4
 
< 0.1%
Open Punctuation4
 
< 0.1%
Decimal Number4
 
< 0.1%
Control2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1097
13.4%
t899
11.0%
a650
 
8.0%
o585
 
7.2%
n525
 
6.4%
i524
 
6.4%
s524
 
6.4%
h449
 
5.5%
r438
 
5.4%
l311
 
3.8%
Other values (16)2162
26.5%
Uppercase Letter
ValueCountFrequency (%)
I64
37.4%
T25
 
14.6%
N20
 
11.7%
W11
 
6.4%
R6
 
3.5%
A6
 
3.5%
P6
 
3.5%
B5
 
2.9%
M4
 
2.3%
F4
 
2.3%
Other values (10)20
 
11.7%
Other Punctuation
ValueCountFrequency (%)
.99
54.7%
,29
 
16.0%
'26
 
14.4%
"14
 
7.7%
?7
 
3.9%
/4
 
2.2%
;2
 
1.1%
Decimal Number
ValueCountFrequency (%)
03
75.0%
21
 
25.0%
Space Separator
ValueCountFrequency (%)
1743
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10
100.0%
Close Punctuation
ValueCountFrequency (%)
)4
100.0%
Open Punctuation
ValueCountFrequency (%)
(4
100.0%
Control
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8335
81.1%
Common1948
 
18.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1097
13.2%
t899
 
10.8%
a650
 
7.8%
o585
 
7.0%
n525
 
6.3%
i524
 
6.3%
s524
 
6.3%
h449
 
5.4%
r438
 
5.3%
l311
 
3.7%
Other values (36)2333
28.0%
Common
ValueCountFrequency (%)
1743
89.5%
.99
 
5.1%
,29
 
1.5%
'26
 
1.3%
"14
 
0.7%
-10
 
0.5%
?7
 
0.4%
)4
 
0.2%
(4
 
0.2%
/4
 
0.2%
Other values (4)8
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII10283
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1743
17.0%
e1097
 
10.7%
t899
 
8.7%
a650
 
6.3%
o585
 
5.7%
n525
 
5.1%
i524
 
5.1%
s524
 
5.1%
h449
 
4.4%
r438
 
4.3%
Other values (50)2849
27.7%

design_cont
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Very important
165 
Moderately important
124 
Extremely important
105 
Slightly important
67 
Not at all important
38 

Length

Max length20
Median length19
Mean length17.53707415
Min length14

Characters and Unicode

Total characters8751
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot at all important
2nd rowNot at all important
3rd rowExtremely important
4th rowModerately important
5th rowExtremely important

Common Values

ValueCountFrequency (%)
Very important165
33.1%
Moderately important124
24.8%
Extremely important105
21.0%
Slightly important67
13.4%
Not at all important38
 
7.6%

Length

2022-11-16T16:18:15.486828image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:15.564658image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
46.5%
very165
 
15.4%
moderately124
 
11.5%
extremely105
 
9.8%
slightly67
 
6.2%
not38
 
3.5%
at38
 
3.5%
all38
 
3.5%

Most occurring characters

ValueCountFrequency (%)
t1370
15.7%
r893
10.2%
a699
 
8.0%
o661
 
7.6%
e623
 
7.1%
m604
 
6.9%
575
 
6.6%
i566
 
6.5%
p499
 
5.7%
n499
 
5.7%
Other values (11)1762
20.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7677
87.7%
Space Separator575
 
6.6%
Uppercase Letter499
 
5.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1370
17.8%
r893
11.6%
a699
9.1%
o661
8.6%
e623
8.1%
m604
7.9%
i566
7.4%
p499
 
6.5%
n499
 
6.5%
y461
 
6.0%
Other values (5)802
10.4%
Uppercase Letter
ValueCountFrequency (%)
V165
33.1%
M124
24.8%
E105
21.0%
S67
13.4%
N38
 
7.6%
Space Separator
ValueCountFrequency (%)
575
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8176
93.4%
Common575
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1370
16.8%
r893
10.9%
a699
8.5%
o661
8.1%
e623
7.6%
m604
7.4%
i566
6.9%
p499
 
6.1%
n499
 
6.1%
y461
 
5.6%
Other values (10)1301
15.9%
Common
ValueCountFrequency (%)
575
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8751
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1370
15.7%
r893
10.2%
a699
 
8.0%
o661
 
7.6%
e623
 
7.1%
m604
 
6.9%
575
 
6.6%
i566
 
6.5%
p499
 
5.7%
n499
 
5.7%
Other values (11)1762
20.1%

design_num_users
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Not at all important
125 
Very important
106 
Slightly important
95 
Moderately important
89 
Extremely important
84 

Length

Max length20
Median length19
Mean length18.17635271
Min length14

Characters and Unicode

Total characters9070
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot at all important
2nd rowNot at all important
3rd rowVery important
4th rowModerately important
5th rowNot at all important

Common Values

ValueCountFrequency (%)
Not at all important125
25.1%
Very important106
21.2%
Slightly important95
19.0%
Moderately important89
17.8%
Extremely important84
16.8%

Length

2022-11-16T16:18:15.643252image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:15.722890image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
40.0%
not125
 
10.0%
at125
 
10.0%
all125
 
10.0%
very106
 
8.5%
slightly95
 
7.6%
moderately89
 
7.1%
extremely84
 
6.7%

Most occurring characters

ValueCountFrequency (%)
t1516
16.7%
a838
9.2%
r778
8.6%
749
8.3%
o713
7.9%
l613
 
6.8%
i594
 
6.5%
m583
 
6.4%
n499
 
5.5%
p499
 
5.5%
Other values (11)1688
18.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7822
86.2%
Space Separator749
 
8.3%
Uppercase Letter499
 
5.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1516
19.4%
a838
10.7%
r778
9.9%
o713
9.1%
l613
7.8%
i594
 
7.6%
m583
 
7.5%
n499
 
6.4%
p499
 
6.4%
e452
 
5.8%
Other values (5)737
9.4%
Uppercase Letter
ValueCountFrequency (%)
N125
25.1%
V106
21.2%
S95
19.0%
M89
17.8%
E84
16.8%
Space Separator
ValueCountFrequency (%)
749
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8321
91.7%
Common749
 
8.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1516
18.2%
a838
10.1%
r778
9.3%
o713
8.6%
l613
7.4%
i594
 
7.1%
m583
 
7.0%
n499
 
6.0%
p499
 
6.0%
e452
 
5.4%
Other values (10)1236
14.9%
Common
ValueCountFrequency (%)
749
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9070
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1516
16.7%
a838
9.2%
r778
8.6%
749
8.3%
o713
7.9%
l613
 
6.8%
i594
 
6.5%
m583
 
6.4%
n499
 
5.5%
p499
 
5.5%
Other values (11)1688
18.6%

design_res_purp
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Very important
136 
Extremely important
121 
Moderately important
109 
Slightly important
68 
Not at all important
65 

Length

Max length20
Median length19
Mean length17.8496994
Min length14

Characters and Unicode

Total characters8907
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot at all important
2nd rowNot at all important
3rd rowVery important
4th rowExtremely important
5th rowNot at all important

Common Values

ValueCountFrequency (%)
Very important136
27.3%
Extremely important121
24.2%
Moderately important109
21.8%
Slightly important68
13.6%
Not at all important65
13.0%

Length

2022-11-16T16:18:15.802975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:15.882158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
44.2%
very136
 
12.1%
extremely121
 
10.7%
moderately109
 
9.7%
slightly68
 
6.0%
not65
 
5.8%
at65
 
5.8%
all65
 
5.8%

Most occurring characters

ValueCountFrequency (%)
t1426
16.0%
r865
9.7%
a738
8.3%
o673
 
7.6%
629
 
7.1%
m620
 
7.0%
e596
 
6.7%
i567
 
6.4%
p499
 
5.6%
n499
 
5.6%
Other values (11)1795
20.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7779
87.3%
Space Separator629
 
7.1%
Uppercase Letter499
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1426
18.3%
r865
11.1%
a738
9.5%
o673
8.7%
m620
8.0%
e596
7.7%
i567
 
7.3%
p499
 
6.4%
n499
 
6.4%
l496
 
6.4%
Other values (5)800
10.3%
Uppercase Letter
ValueCountFrequency (%)
V136
27.3%
E121
24.2%
M109
21.8%
S68
13.6%
N65
13.0%
Space Separator
ValueCountFrequency (%)
629
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8278
92.9%
Common629
 
7.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1426
17.2%
r865
10.4%
a738
8.9%
o673
8.1%
m620
7.5%
e596
7.2%
i567
 
6.8%
p499
 
6.0%
n499
 
6.0%
l496
 
6.0%
Other values (10)1299
15.7%
Common
ValueCountFrequency (%)
629
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8907
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1426
16.0%
r865
9.7%
a738
8.3%
o673
 
7.6%
629
 
7.1%
m620
 
7.0%
e596
 
6.7%
i567
 
6.4%
p499
 
5.6%
n499
 
5.6%
Other values (11)1795
20.2%

design_len_data
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Very important
120 
Moderately important
119 
Slightly important
107 
Not at all important
97 
Extremely important
56 

Length

Max length20
Median length19
Mean length18.01603206
Min length14

Characters and Unicode

Total characters8990
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot at all important
2nd rowNot at all important
3rd rowExtremely important
4th rowVery important
5th rowNot at all important

Common Values

ValueCountFrequency (%)
Very important120
24.0%
Moderately important119
23.8%
Slightly important107
21.4%
Not at all important97
19.4%
Extremely important56
11.2%

Length

2022-11-16T16:18:15.962065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:16.040599image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
41.9%
very120
 
10.1%
moderately119
 
10.0%
slightly107
 
9.0%
not97
 
8.1%
at97
 
8.1%
all97
 
8.1%
extremely56
 
4.7%

Most occurring characters

ValueCountFrequency (%)
t1474
16.4%
a812
9.0%
r794
8.8%
o715
8.0%
693
 
7.7%
i606
 
6.7%
l583
 
6.5%
m555
 
6.2%
p499
 
5.6%
n499
 
5.6%
Other values (11)1760
19.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7798
86.7%
Space Separator693
 
7.7%
Uppercase Letter499
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1474
18.9%
a812
10.4%
r794
10.2%
o715
9.2%
i606
7.8%
l583
 
7.5%
m555
 
7.1%
p499
 
6.4%
n499
 
6.4%
e470
 
6.0%
Other values (5)791
10.1%
Uppercase Letter
ValueCountFrequency (%)
V120
24.0%
M119
23.8%
S107
21.4%
N97
19.4%
E56
11.2%
Space Separator
ValueCountFrequency (%)
693
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8297
92.3%
Common693
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1474
17.8%
a812
9.8%
r794
9.6%
o715
8.6%
i606
7.3%
l583
 
7.0%
m555
 
6.7%
p499
 
6.0%
n499
 
6.0%
e470
 
5.7%
Other values (10)1290
15.5%
Common
ValueCountFrequency (%)
693
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8990
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1474
16.4%
a812
9.0%
r794
8.8%
o715
8.0%
693
 
7.7%
i606
 
6.7%
l583
 
6.5%
m555
 
6.2%
p499
 
5.6%
n499
 
5.6%
Other values (11)1760
19.6%

design_admin_inter
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Moderately important
139 
Very important
109 
Slightly important
96 
Not at all important
91 
Extremely important
64 

Length

Max length20
Median length19
Mean length18.17635271
Min length14

Characters and Unicode

Total characters9070
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot at all important
2nd rowNot at all important
3rd rowModerately important
4th rowModerately important
5th rowNot at all important

Common Values

ValueCountFrequency (%)
Moderately important139
27.9%
Very important109
21.8%
Slightly important96
19.2%
Not at all important91
18.2%
Extremely important64
12.8%

Length

2022-11-16T16:18:16.120939image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:16.200374image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
42.3%
moderately139
 
11.8%
very109
 
9.2%
slightly96
 
8.1%
not91
 
7.7%
at91
 
7.7%
all91
 
7.7%
extremely64
 
5.4%

Most occurring characters

ValueCountFrequency (%)
t1479
16.3%
a820
9.0%
r811
8.9%
o729
8.0%
681
 
7.5%
i595
 
6.6%
l577
 
6.4%
m563
 
6.2%
e515
 
5.7%
p499
 
5.5%
Other values (11)1801
19.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7890
87.0%
Space Separator681
 
7.5%
Uppercase Letter499
 
5.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1479
18.7%
a820
10.4%
r811
10.3%
o729
9.2%
i595
7.5%
l577
 
7.3%
m563
 
7.1%
e515
 
6.5%
p499
 
6.3%
n499
 
6.3%
Other values (5)803
10.2%
Uppercase Letter
ValueCountFrequency (%)
M139
27.9%
V109
21.8%
S96
19.2%
N91
18.2%
E64
12.8%
Space Separator
ValueCountFrequency (%)
681
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8389
92.5%
Common681
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1479
17.6%
a820
9.8%
r811
9.7%
o729
8.7%
i595
7.1%
l577
 
6.9%
m563
 
6.7%
e515
 
6.1%
p499
 
5.9%
n499
 
5.9%
Other values (10)1302
15.5%
Common
ValueCountFrequency (%)
681
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9070
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1479
16.3%
a820
9.0%
r811
8.9%
o729
8.0%
681
 
7.5%
i595
 
6.6%
l577
 
6.4%
m563
 
6.2%
e515
 
5.7%
p499
 
5.5%
Other values (11)1801
19.9%

design_inter_type
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Very important
162 
Moderately important
145 
Extremely important
94 
Slightly important
65 
Not at all important
33 

Length

Max length20
Median length19
Mean length17.60320641
Min length14

Characters and Unicode

Total characters8784
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSlightly important
2nd rowNot at all important
3rd rowVery important
4th rowVery important
5th rowExtremely important

Common Values

ValueCountFrequency (%)
Very important162
32.5%
Moderately important145
29.1%
Extremely important94
18.8%
Slightly important65
13.0%
Not at all important33
 
6.6%

Length

2022-11-16T16:18:16.280687image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:16.361248image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
46.9%
very162
 
15.2%
moderately145
 
13.6%
extremely94
 
8.8%
slightly65
 
6.1%
not33
 
3.1%
at33
 
3.1%
all33
 
3.1%

Most occurring characters

ValueCountFrequency (%)
t1368
15.6%
r900
10.2%
a710
8.1%
o677
 
7.7%
e640
 
7.3%
m593
 
6.8%
565
 
6.4%
i564
 
6.4%
p499
 
5.7%
n499
 
5.7%
Other values (11)1769
20.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7720
87.9%
Space Separator565
 
6.4%
Uppercase Letter499
 
5.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1368
17.7%
r900
11.7%
a710
9.2%
o677
8.8%
e640
8.3%
m593
7.7%
i564
7.3%
p499
 
6.5%
n499
 
6.5%
y466
 
6.0%
Other values (5)804
10.4%
Uppercase Letter
ValueCountFrequency (%)
V162
32.5%
M145
29.1%
E94
18.8%
S65
13.0%
N33
 
6.6%
Space Separator
ValueCountFrequency (%)
565
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8219
93.6%
Common565
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1368
16.6%
r900
11.0%
a710
8.6%
o677
8.2%
e640
7.8%
m593
7.2%
i564
6.9%
p499
 
6.1%
n499
 
6.1%
y466
 
5.7%
Other values (10)1303
15.9%
Common
ValueCountFrequency (%)
565
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8784
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1368
15.6%
r900
10.2%
a710
8.1%
o677
 
7.7%
e640
 
7.3%
m593
 
6.8%
565
 
6.4%
i564
 
6.4%
p499
 
5.7%
n499
 
5.7%
Other values (11)1769
20.1%

design_partic_aware
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Extremely important
191 
Very important
132 
Moderately important
92 
Slightly important
53 
Not at all important
31 

Length

Max length20
Median length19
Mean length17.81763527
Min length14

Characters and Unicode

Total characters8891
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSlightly important
2nd rowModerately important
3rd rowModerately important
4th rowExtremely important
5th rowSlightly important

Common Values

ValueCountFrequency (%)
Extremely important191
38.3%
Very important132
26.5%
Moderately important92
18.4%
Slightly important53
 
10.6%
Not at all important31
 
6.2%

Length

2022-11-16T16:18:16.441776image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:16.520442image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
47.1%
extremely191
 
18.0%
very132
 
12.5%
moderately92
 
8.7%
slightly53
 
5.0%
not31
 
2.9%
at31
 
2.9%
all31
 
2.9%

Most occurring characters

ValueCountFrequency (%)
t1396
15.7%
r914
10.3%
e698
 
7.9%
m690
 
7.8%
a653
 
7.3%
o622
 
7.0%
561
 
6.3%
i552
 
6.2%
p499
 
5.6%
n499
 
5.6%
Other values (11)1807
20.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7831
88.1%
Space Separator561
 
6.3%
Uppercase Letter499
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1396
17.8%
r914
11.7%
e698
8.9%
m690
8.8%
a653
8.3%
o622
7.9%
i552
 
7.0%
p499
 
6.4%
n499
 
6.4%
y468
 
6.0%
Other values (5)840
10.7%
Uppercase Letter
ValueCountFrequency (%)
E191
38.3%
V132
26.5%
M92
18.4%
S53
 
10.6%
N31
 
6.2%
Space Separator
ValueCountFrequency (%)
561
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8330
93.7%
Common561
 
6.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1396
16.8%
r914
11.0%
e698
8.4%
m690
8.3%
a653
7.8%
o622
7.5%
i552
 
6.6%
p499
 
6.0%
n499
 
6.0%
y468
 
5.6%
Other values (10)1339
16.1%
Common
ValueCountFrequency (%)
561
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8891
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1396
15.7%
r914
10.3%
e698
 
7.9%
m690
 
7.8%
a653
 
7.3%
o622
 
7.0%
561
 
6.3%
i552
 
6.2%
p499
 
5.6%
n499
 
5.6%
Other values (11)1807
20.3%

design_inter_impact
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Very important
139 
Moderately important
114 
Extremely important
107 
Slightly important
80 
Not at all important
59 

Length

Max length20
Median length19
Mean length17.79358717
Min length14

Characters and Unicode

Total characters8879
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot at all important
2nd rowNot at all important
3rd rowExtremely important
4th rowVery important
5th rowNot at all important

Common Values

ValueCountFrequency (%)
Very important139
27.9%
Moderately important114
22.8%
Extremely important107
21.4%
Slightly important80
16.0%
Not at all important59
11.8%

Length

2022-11-16T16:18:16.598145image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:16.676158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
44.7%
very139
 
12.5%
moderately114
 
10.2%
extremely107
 
9.6%
slightly80
 
7.2%
not59
 
5.3%
at59
 
5.3%
all59
 
5.3%

Most occurring characters

ValueCountFrequency (%)
t1417
16.0%
r859
9.7%
a731
8.2%
o672
 
7.6%
617
 
6.9%
m606
 
6.8%
e581
 
6.5%
i579
 
6.5%
l499
 
5.6%
p499
 
5.6%
Other values (11)1819
20.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7763
87.4%
Space Separator617
 
6.9%
Uppercase Letter499
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1417
18.3%
r859
11.1%
a731
9.4%
o672
8.7%
m606
7.8%
e581
7.5%
i579
7.5%
l499
 
6.4%
p499
 
6.4%
n499
 
6.4%
Other values (5)821
10.6%
Uppercase Letter
ValueCountFrequency (%)
V139
27.9%
M114
22.8%
E107
21.4%
S80
16.0%
N59
11.8%
Space Separator
ValueCountFrequency (%)
617
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8262
93.1%
Common617
 
6.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1417
17.2%
r859
10.4%
a731
8.8%
o672
8.1%
m606
7.3%
e581
7.0%
i579
7.0%
l499
 
6.0%
p499
 
6.0%
n499
 
6.0%
Other values (10)1320
16.0%
Common
ValueCountFrequency (%)
617
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8879
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1417
16.0%
r859
9.7%
a731
8.2%
o672
 
7.6%
617
 
6.9%
m606
 
6.8%
e581
 
6.5%
i579
 
6.5%
l499
 
5.6%
p499
 
5.6%
Other values (11)1819
20.5%

design_type_data
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Very important
144 
Moderately important
130 
Extremely important
106 
Slightly important
70 
Not at all important
49 

Length

Max length20
Median length19
Mean length17.7755511
Min length14

Characters and Unicode

Total characters8870
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot at all important
2nd rowNot at all important
3rd rowNot at all important
4th rowVery important
5th rowExtremely important

Common Values

ValueCountFrequency (%)
Very important144
28.9%
Moderately important130
26.1%
Extremely important106
21.2%
Slightly important70
14.0%
Not at all important49
 
9.8%

Length

2022-11-16T16:18:16.755945image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-16T16:18:16.841401image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
important499
45.5%
very144
 
13.1%
moderately130
 
11.9%
extremely106
 
9.7%
slightly70
 
6.4%
not49
 
4.5%
at49
 
4.5%
all49
 
4.5%

Most occurring characters

ValueCountFrequency (%)
t1402
15.8%
r879
9.9%
a727
8.2%
o678
 
7.6%
e616
 
6.9%
m605
 
6.8%
597
 
6.7%
i569
 
6.4%
p499
 
5.6%
n499
 
5.6%
Other values (11)1799
20.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7774
87.6%
Space Separator597
 
6.7%
Uppercase Letter499
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1402
18.0%
r879
11.3%
a727
9.4%
o678
8.7%
e616
7.9%
m605
7.8%
i569
7.3%
p499
 
6.4%
n499
 
6.4%
l474
 
6.1%
Other values (5)826
10.6%
Uppercase Letter
ValueCountFrequency (%)
V144
28.9%
M130
26.1%
E106
21.2%
S70
14.0%
N49
 
9.8%
Space Separator
ValueCountFrequency (%)
597
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8273
93.3%
Common597
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1402
16.9%
r879
10.6%
a727
8.8%
o678
8.2%
e616
7.4%
m605
7.3%
i569
6.9%
p499
 
6.0%
n499
 
6.0%
l474
 
5.7%
Other values (10)1325
16.0%
Common
ValueCountFrequency (%)
597
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8870
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t1402
15.8%
r879
9.9%
a727
8.2%
o678
 
7.6%
e616
 
6.9%
m605
 
6.8%
597
 
6.7%
i569
 
6.4%
p499
 
5.6%
n499
 
5.6%
Other values (11)1799
20.3%

design_add_fac
Categorical

HIGH CARDINALITY
MISSING

Distinct261
Distinct (%)65.9%
Missing103
Missing (%)20.6%
Memory size4.0 KiB
No
48 
no
41 
None
 
13
na
 
12
No.
 
9
Other values (256)
273 

Length

Max length661
Median length317
Mean length68.42171717
Min length2

Characters and Unicode

Total characters27095
Distinct characters70
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique250 ?
Unique (%)63.1%

Sample

1st rowNo.
2nd rowThe only aspects of social media research that would cause concern for me is saving photographs or imaging data.
3rd rowNone that I can think of, other than what has been asked already.
4th rowReducing any type of hate is always a good thing.
5th rowno

Common Values

ValueCountFrequency (%)
No48
 
9.6%
no41
 
8.2%
None13
 
2.6%
na12
 
2.4%
No.9
 
1.8%
none9
 
1.8%
NO5
 
1.0%
None that I can think of.3
 
0.6%
Nope2
 
0.4%
not that i can think of2
 
0.4%
Other values (251)252
50.5%
(Missing)103
20.6%

Length

2022-11-16T16:18:16.958617image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the204
 
4.2%
of151
 
3.1%
i134
 
2.8%
is132
 
2.7%
no129
 
2.7%
to128
 
2.7%
that94
 
1.9%
a81
 
1.7%
and81
 
1.7%
not70
 
1.5%
Other values (1024)3621
75.0%

Most occurring characters

ValueCountFrequency (%)
4517
16.7%
e2626
 
9.7%
t2117
 
7.8%
o1785
 
6.6%
a1776
 
6.6%
n1636
 
6.0%
i1575
 
5.8%
s1379
 
5.1%
r1250
 
4.6%
h1094
 
4.0%
Other values (60)7340
27.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter21431
79.1%
Space Separator4517
 
16.7%
Other Punctuation561
 
2.1%
Uppercase Letter534
 
2.0%
Dash Punctuation19
 
0.1%
Control12
 
< 0.1%
Open Punctuation6
 
< 0.1%
Close Punctuation5
 
< 0.1%
Decimal Number5
 
< 0.1%
Final Punctuation4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2626
12.3%
t2117
 
9.9%
o1785
 
8.3%
a1776
 
8.3%
n1636
 
7.6%
i1575
 
7.3%
s1379
 
6.4%
r1250
 
5.8%
h1094
 
5.1%
l816
 
3.8%
Other values (16)5377
25.1%
Uppercase Letter
ValueCountFrequency (%)
I192
36.0%
N130
24.3%
T40
 
7.5%
A30
 
5.6%
W28
 
5.2%
H18
 
3.4%
O13
 
2.4%
S11
 
2.1%
C9
 
1.7%
M9
 
1.7%
Other values (13)54
 
10.1%
Other Punctuation
ValueCountFrequency (%)
.302
53.8%
,129
23.0%
'87
 
15.5%
/16
 
2.9%
?14
 
2.5%
"10
 
1.8%
:1
 
0.2%
1
 
0.2%
!1
 
0.2%
Decimal Number
ValueCountFrequency (%)
11
20.0%
01
20.0%
31
20.0%
41
20.0%
21
20.0%
Space Separator
ValueCountFrequency (%)
4517
100.0%
Dash Punctuation
ValueCountFrequency (%)
-19
100.0%
Control
ValueCountFrequency (%)
12
100.0%
Open Punctuation
ValueCountFrequency (%)
(6
100.0%
Close Punctuation
ValueCountFrequency (%)
)5
100.0%
Final Punctuation
ValueCountFrequency (%)
4
100.0%
Math Symbol
ValueCountFrequency (%)
+1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin21965
81.1%
Common5130
 
18.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2626
12.0%
t2117
 
9.6%
o1785
 
8.1%
a1776
 
8.1%
n1636
 
7.4%
i1575
 
7.2%
s1379
 
6.3%
r1250
 
5.7%
h1094
 
5.0%
l816
 
3.7%
Other values (39)5911
26.9%
Common
ValueCountFrequency (%)
4517
88.1%
.302
 
5.9%
,129
 
2.5%
'87
 
1.7%
-19
 
0.4%
/16
 
0.3%
?14
 
0.3%
12
 
0.2%
"10
 
0.2%
(6
 
0.1%
Other values (11)18
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII27090
> 99.9%
Punctuation5
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4517
16.7%
e2626
 
9.7%
t2117
 
7.8%
o1785
 
6.6%
a1776
 
6.6%
n1636
 
6.0%
i1575
 
5.8%
s1379
 
5.1%
r1250
 
4.6%
h1094
 
4.0%
Other values (58)7335
27.1%
Punctuation
ValueCountFrequency (%)
4
80.0%
1
 
20.0%

rank_sci_repro
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.521042084
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:17.041543image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median6
Q37
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.679386636
Coefficient of variation (CV)0.3041792855
Kurtosis0.1087548301
Mean5.521042084
Median Absolute Deviation (MAD)1
Skewness-1.025007039
Sum2755
Variance2.820339474
MonotonicityNot monotonic
2022-11-16T16:18:17.102581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
7203
40.7%
6102
20.4%
570
 
14.0%
449
 
9.8%
342
 
8.4%
217
 
3.4%
116
 
3.2%
ValueCountFrequency (%)
116
 
3.2%
217
 
3.4%
342
 
8.4%
449
 
9.8%
570
 
14.0%
6102
20.4%
7203
40.7%
ValueCountFrequency (%)
7203
40.7%
6102
20.4%
570
 
14.0%
449
 
9.8%
342
 
8.4%
217
 
3.4%
116
 
3.2%

rank_resp
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.527054108
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:17.171584image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.057596743
Coefficient of variation (CV)0.5833754402
Kurtosis-1.187605078
Mean3.527054108
Median Absolute Deviation (MAD)2
Skewness0.3541560179
Sum1760
Variance4.233704357
MonotonicityNot monotonic
2022-11-16T16:18:17.229291image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1106
21.2%
293
18.6%
371
14.2%
470
14.0%
762
12.4%
656
11.2%
541
 
8.2%
ValueCountFrequency (%)
1106
21.2%
293
18.6%
371
14.2%
470
14.0%
541
 
8.2%
656
11.2%
762
12.4%
ValueCountFrequency (%)
762
12.4%
656
11.2%
541
 
8.2%
470
14.0%
371
14.2%
293
18.6%
1106
21.2%

rank_just
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.651302605
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:17.292165image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.671474984
Coefficient of variation (CV)0.3593563192
Kurtosis-0.5866613384
Mean4.651302605
Median Absolute Deviation (MAD)1
Skewness-0.4219703321
Sum2321
Variance2.793828621
MonotonicityNot monotonic
2022-11-16T16:18:17.349310image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
5115
23.0%
696
19.2%
491
18.2%
775
15.0%
363
12.6%
233
 
6.6%
126
 
5.2%
ValueCountFrequency (%)
126
 
5.2%
233
 
6.6%
363
12.6%
491
18.2%
5115
23.0%
696
19.2%
775
15.0%
ValueCountFrequency (%)
775
15.0%
696
19.2%
5115
23.0%
491
18.2%
363
12.6%
233
 
6.6%
126
 
5.2%

rank_anony
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.054108216
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:17.411501image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.587816994
Coefficient of variation (CV)0.5198954594
Kurtosis-0.4222403
Mean3.054108216
Median Absolute Deviation (MAD)1
Skewness0.6240587048
Sum1524
Variance2.521162808
MonotonicityNot monotonic
2022-11-16T16:18:17.469817image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2141
28.3%
3105
21.0%
181
16.2%
474
14.8%
551
 
10.2%
634
 
6.8%
713
 
2.6%
ValueCountFrequency (%)
181
16.2%
2141
28.3%
3105
21.0%
474
14.8%
551
 
10.2%
634
 
6.8%
713
 
2.6%
ValueCountFrequency (%)
713
 
2.6%
634
 
6.8%
551
 
10.2%
474
14.8%
3105
21.0%
2141
28.3%
181
16.2%

rank_harms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.69739479
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:17.539970image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.808784129
Coefficient of variation (CV)0.6705670732
Kurtosis-0.4651366385
Mean2.69739479
Median Absolute Deviation (MAD)1
Skewness0.8169441724
Sum1346
Variance3.271700027
MonotonicityNot monotonic
2022-11-16T16:18:17.602507image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1189
37.9%
291
18.2%
366
 
13.2%
458
 
11.6%
545
 
9.0%
630
 
6.0%
720
 
4.0%
ValueCountFrequency (%)
1189
37.9%
291
18.2%
366
 
13.2%
458
 
11.6%
545
 
9.0%
630
 
6.0%
720
 
4.0%
ValueCountFrequency (%)
720
 
4.0%
630
 
6.0%
545
 
9.0%
458
 
11.6%
366
 
13.2%
291
18.2%
1189
37.9%

rank_balance
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.975951904
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:17.666459image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.616749319
Coefficient of variation (CV)0.3249125695
Kurtosis-0.4443151664
Mean4.975951904
Median Absolute Deviation (MAD)1
Skewness-0.5992454593
Sum2483
Variance2.613878359
MonotonicityNot monotonic
2022-11-16T16:18:17.723903image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
6122
24.4%
5111
22.2%
797
19.4%
468
13.6%
358
11.6%
228
 
5.6%
115
 
3.0%
ValueCountFrequency (%)
115
 
3.0%
228
 
5.6%
358
11.6%
468
13.6%
5111
22.2%
6122
24.4%
797
19.4%
ValueCountFrequency (%)
797
19.4%
6122
24.4%
5111
22.2%
468
13.6%
358
11.6%
228
 
5.6%
115
 
3.0%

rank_pub_interst
Real number (ℝ≥0)

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.573146293
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-11-16T16:18:17.786046image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.760841438
Coefficient of variation (CV)0.4927985854
Kurtosis-0.9434350009
Mean3.573146293
Median Absolute Deviation (MAD)1
Skewness0.2559463962
Sum1783
Variance3.100562571
MonotonicityNot monotonic
2022-11-16T16:18:17.844476image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
296
19.2%
394
18.8%
489
17.8%
166
13.2%
566
13.2%
659
11.8%
729
 
5.8%
ValueCountFrequency (%)
166
13.2%
296
19.2%
394
18.8%
489
17.8%
566
13.2%
659
11.8%
729
 
5.8%
ValueCountFrequency (%)
729
 
5.8%
659
11.8%
566
13.2%
489
17.8%
394
18.8%
296
19.2%
166
13.2%

rank_add_fac_1
Categorical

HIGH CARDINALITY
MISSING

Distinct118
Distinct (%)79.7%
Missing351
Missing (%)70.3%
Memory size4.0 KiB
na
 
11
none
 
10
Na
 
5
None
 
5
No
 
3
Other values (113)
114 

Length

Max length338
Median length134
Mean length50.28378378
Min length1

Characters and Unicode

Total characters7442
Distinct characters63
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique112 ?
Unique (%)75.7%

Sample

1st rowNa
2nd rowOffer results to participants
3rd rowA full disclosure of any political organizations of which a researcher belongs to or has donated to within a previous time frame (such as 4 yrs).
4th rowIncrease your visibility
5th rowThe researchers should not intrude into the user's personal lives

Common Values

ValueCountFrequency (%)
na11
 
2.2%
none10
 
2.0%
Na5
 
1.0%
None5
 
1.0%
No3
 
0.6%
No additional factors2
 
0.4%
Avoid putting out false information1
 
0.2%
inclusion of all sides (far right, far left, moderate) when it comes to research on things like hate speech and false information1
 
0.2%
Understanding the limits of the Internet.1
 
0.2%
The social media service (twittet, facebook, etc) knows the data is being collected.1
 
0.2%
Other values (108)108
 
21.6%
(Missing)351
70.3%

Length

2022-11-16T16:18:17.938021image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the67
 
5.4%
of42
 
3.4%
to37
 
3.0%
be26
 
2.1%
and21
 
1.7%
for21
 
1.7%
study20
 
1.6%
is19
 
1.5%
participants18
 
1.5%
or17
 
1.4%
Other values (489)945
76.6%

Most occurring characters

ValueCountFrequency (%)
1105
14.8%
e694
 
9.3%
t621
 
8.3%
a517
 
6.9%
o482
 
6.5%
i478
 
6.4%
n453
 
6.1%
s402
 
5.4%
r385
 
5.2%
h254
 
3.4%
Other values (53)2051
27.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6025
81.0%
Space Separator1105
 
14.8%
Uppercase Letter135
 
1.8%
Other Punctuation129
 
1.7%
Decimal Number12
 
0.2%
Dash Punctuation11
 
0.1%
Close Punctuation11
 
0.1%
Open Punctuation11
 
0.1%
Control2
 
< 0.1%
Final Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e694
11.5%
t621
10.3%
a517
 
8.6%
o482
 
8.0%
i478
 
7.9%
n453
 
7.5%
s402
 
6.7%
r385
 
6.4%
h254
 
4.2%
l224
 
3.7%
Other values (16)1515
25.1%
Uppercase Letter
ValueCountFrequency (%)
I24
17.8%
N20
14.8%
T11
8.1%
A11
8.1%
R10
 
7.4%
D9
 
6.7%
P7
 
5.2%
M6
 
4.4%
C6
 
4.4%
H6
 
4.4%
Other values (9)25
18.5%
Other Punctuation
ValueCountFrequency (%)
.53
41.1%
,36
27.9%
'21
 
16.3%
"8
 
6.2%
/6
 
4.7%
?4
 
3.1%
!1
 
0.8%
Decimal Number
ValueCountFrequency (%)
25
41.7%
43
25.0%
12
 
16.7%
51
 
8.3%
61
 
8.3%
Space Separator
ValueCountFrequency (%)
1105
100.0%
Dash Punctuation
ValueCountFrequency (%)
-11
100.0%
Close Punctuation
ValueCountFrequency (%)
)11
100.0%
Open Punctuation
ValueCountFrequency (%)
(11
100.0%
Control
ValueCountFrequency (%)
2
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6160
82.8%
Common1282
 
17.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e694
11.3%
t621
 
10.1%
a517
 
8.4%
o482
 
7.8%
i478
 
7.8%
n453
 
7.4%
s402
 
6.5%
r385
 
6.2%
h254
 
4.1%
l224
 
3.6%
Other values (35)1650
26.8%
Common
ValueCountFrequency (%)
1105
86.2%
.53
 
4.1%
,36
 
2.8%
'21
 
1.6%
-11
 
0.9%
)11
 
0.9%
(11
 
0.9%
"8
 
0.6%
/6
 
0.5%
25
 
0.4%
Other values (8)15
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7441
> 99.9%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1105
14.9%
e694
 
9.3%
t621
 
8.3%
a517
 
6.9%
o482
 
6.5%
i478
 
6.4%
n453
 
6.1%
s402
 
5.4%
r385
 
5.2%
h254
 
3.4%
Other values (52)2050
27.6%
Punctuation
ValueCountFrequency (%)
1
100.0%

rank_add_fac_1_pos
Categorical

HIGH CORRELATION
MISSING

Distinct14
Distinct (%)8.9%
Missing342
Missing (%)68.5%
Memory size4.0 KiB
8
44 
1
42 
0
12 
3
12 
4
10 
Other values (9)
37 

Length

Max length4
Median length1
Mean length1.121019108
Min length1

Characters and Unicode

Total characters176
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row8
2nd row8
3rd row6
4th row8
5th row8

Common Values

ValueCountFrequency (%)
844
 
8.8%
142
 
8.4%
012
 
2.4%
312
 
2.4%
410
 
2.0%
107
 
1.4%
66
 
1.2%
76
 
1.2%
55
 
1.0%
na4
 
0.8%
Other values (4)9
 
1.8%
(Missing)342
68.5%

Length

2022-11-16T16:18:18.026866image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
844
28.0%
142
26.8%
012
 
7.6%
312
 
7.6%
410
 
6.4%
107
 
4.5%
66
 
3.8%
76
 
3.8%
na6
 
3.8%
55
 
3.2%
Other values (3)7
 
4.5%

Most occurring characters

ValueCountFrequency (%)
149
27.8%
844
25.0%
019
 
10.8%
312
 
6.8%
410
 
5.7%
n8
 
4.5%
66
 
3.4%
76
 
3.4%
a6
 
3.4%
55
 
2.8%
Other values (5)11
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number156
88.6%
Lowercase Letter18
 
10.2%
Uppercase Letter2
 
1.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
149
31.4%
844
28.2%
019
 
12.2%
312
 
7.7%
410
 
6.4%
66
 
3.8%
76
 
3.8%
55
 
3.2%
24
 
2.6%
91
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
n8
44.4%
a6
33.3%
o2
 
11.1%
e2
 
11.1%
Uppercase Letter
ValueCountFrequency (%)
N2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common156
88.6%
Latin20
 
11.4%

Most frequent character per script

Common
ValueCountFrequency (%)
149
31.4%
844
28.2%
019
 
12.2%
312
 
7.7%
410
 
6.4%
66
 
3.8%
76
 
3.8%
55
 
3.2%
24
 
2.6%
91
 
0.6%
Latin
ValueCountFrequency (%)
n8
40.0%
a6
30.0%
o2
 
10.0%
e2
 
10.0%
N2
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII176
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
149
27.8%
844
25.0%
019
 
10.8%
312
 
6.8%
410
 
5.7%
n8
 
4.5%
66
 
3.4%
76
 
3.4%
a6
 
3.4%
55
 
2.8%
Other values (5)11
 
6.2%

rank_add_fac_2
Categorical

HIGH CORRELATION
MISSING

Distinct47
Distinct (%)69.1%
Missing431
Missing (%)86.4%
Memory size4.0 KiB
na
none
Na
 
4
No
 
3
None
 
3
Other values (42)
42 

Length

Max length134
Median length74
Mean length22.5
Min length2

Characters and Unicode

Total characters1530
Distinct characters48
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique42 ?
Unique (%)61.8%

Sample

1st rowNa
2nd rowcontent-sharing
3rd rowfollow up after
4th rowna
5th rowna

Common Values

ValueCountFrequency (%)
na9
 
1.8%
none7
 
1.4%
Na4
 
0.8%
No3
 
0.6%
None3
 
0.6%
Interact as a researcher. It will carry more weight if people know who is suggesting or informing. And this does matter. 1
 
0.2%
At the conclusion, the user should have the option to have their data dismissed.1
 
0.2%
Information about who the researchers are1
 
0.2%
Data Retention Policies1
 
0.2%
Location (urban, rural)1
 
0.2%
Other values (37)37
 
7.4%
(Missing)431
86.4%

Length

2022-11-16T16:18:18.115730image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the15
 
5.7%
na13
 
5.0%
none11
 
4.2%
of7
 
2.7%
to7
 
2.7%
no6
 
2.3%
study5
 
1.9%
is4
 
1.5%
and4
 
1.5%
be4
 
1.5%
Other values (154)186
71.0%

Most occurring characters

ValueCountFrequency (%)
198
12.9%
e151
 
9.9%
a116
 
7.6%
t113
 
7.4%
n108
 
7.1%
i103
 
6.7%
o98
 
6.4%
s93
 
6.1%
r80
 
5.2%
h56
 
3.7%
Other values (38)414
27.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1269
82.9%
Space Separator198
 
12.9%
Uppercase Letter48
 
3.1%
Other Punctuation12
 
0.8%
Open Punctuation1
 
0.1%
Close Punctuation1
 
0.1%
Dash Punctuation1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e151
11.9%
a116
 
9.1%
t113
 
8.9%
n108
 
8.5%
i103
 
8.1%
o98
 
7.7%
s93
 
7.3%
r80
 
6.3%
h56
 
4.4%
u44
 
3.5%
Other values (15)307
24.2%
Uppercase Letter
ValueCountFrequency (%)
N12
25.0%
P9
18.8%
I4
 
8.3%
A4
 
8.3%
M4
 
8.3%
L2
 
4.2%
W2
 
4.2%
R2
 
4.2%
C1
 
2.1%
D1
 
2.1%
Other values (7)7
14.6%
Other Punctuation
ValueCountFrequency (%)
.8
66.7%
,4
33.3%
Space Separator
ValueCountFrequency (%)
198
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1317
86.1%
Common213
 
13.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e151
11.5%
a116
 
8.8%
t113
 
8.6%
n108
 
8.2%
i103
 
7.8%
o98
 
7.4%
s93
 
7.1%
r80
 
6.1%
h56
 
4.3%
u44
 
3.3%
Other values (32)355
27.0%
Common
ValueCountFrequency (%)
198
93.0%
.8
 
3.8%
,4
 
1.9%
(1
 
0.5%
)1
 
0.5%
-1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1530
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
198
12.9%
e151
 
9.9%
a116
 
7.6%
t113
 
7.4%
n108
 
7.1%
i103
 
6.7%
o98
 
6.4%
s93
 
6.1%
r80
 
5.2%
h56
 
3.7%
Other values (38)414
27.1%

rank_add_fac_2_pos
Categorical

HIGH CORRELATION
MISSING

Distinct15
Distinct (%)15.5%
Missing402
Missing (%)80.6%
Memory size4.0 KiB
9
29 
2
20 
0
11 
1
10
Other values (10)
22 

Length

Max length4
Median length1
Mean length1.226804124
Min length1

Characters and Unicode

Total characters119
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)5.2%

Sample

1st row9
2nd row5
3rd row3
4th rowna
5th row3

Common Values

ValueCountFrequency (%)
929
 
5.8%
220
 
4.0%
011
 
2.2%
18
 
1.6%
107
 
1.4%
na5
 
1.0%
54
 
0.8%
84
 
0.8%
32
 
0.4%
Na2
 
0.4%
Other values (5)5
 
1.0%
(Missing)402
80.6%

Length

2022-11-16T16:18:18.201910image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
929
29.9%
220
20.6%
011
 
11.3%
18
 
8.2%
na8
 
8.2%
107
 
7.2%
54
 
4.1%
84
 
4.1%
32
 
2.1%
none2
 
2.1%
Other values (2)2
 
2.1%

Most occurring characters

ValueCountFrequency (%)
929
24.4%
220
16.8%
018
15.1%
115
12.6%
n8
 
6.7%
a7
 
5.9%
54
 
3.4%
84
 
3.4%
N4
 
3.4%
32
 
1.7%
Other values (6)8
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number94
79.0%
Lowercase Letter19
 
16.0%
Uppercase Letter5
 
4.2%
Space Separator1
 
0.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
929
30.9%
220
21.3%
018
19.1%
115
16.0%
54
 
4.3%
84
 
4.3%
32
 
2.1%
71
 
1.1%
61
 
1.1%
Lowercase Letter
ValueCountFrequency (%)
n8
42.1%
a7
36.8%
o2
 
10.5%
e2
 
10.5%
Uppercase Letter
ValueCountFrequency (%)
N4
80.0%
A1
 
20.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common95
79.8%
Latin24
 
20.2%

Most frequent character per script

Common
ValueCountFrequency (%)
929
30.5%
220
21.1%
018
18.9%
115
15.8%
54
 
4.2%
84
 
4.2%
32
 
2.1%
71
 
1.1%
1
 
1.1%
61
 
1.1%
Latin
ValueCountFrequency (%)
n8
33.3%
a7
29.2%
N4
16.7%
o2
 
8.3%
e2
 
8.3%
A1
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII119
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
929
24.4%
220
16.8%
018
15.1%
115
12.6%
n8
 
6.7%
a7
 
5.9%
54
 
3.4%
84
 
3.4%
N4
 
3.4%
32
 
1.7%
Other values (6)8
 
6.7%

rank_add_fac_3
Categorical

HIGH CORRELATION
MISSING

Distinct41
Distinct (%)64.1%
Missing435
Missing (%)87.2%
Memory size4.0 KiB
na
none
Na
No
 
3
None
 
2
Other values (36)
36 

Length

Max length95
Median length76
Mean length20.296875
Min length2

Characters and Unicode

Total characters1299
Distinct characters52
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36 ?
Unique (%)56.2%

Sample

1st rowNa
2nd rowcommunication
3rd rowgive results after entire experiment is done
4th rowna
5th rowna

Common Values

ValueCountFrequency (%)
na9
 
1.8%
none8
 
1.6%
Na6
 
1.2%
No3
 
0.6%
None2
 
0.4%
Adherence to Regulations (like GDPR)1
 
0.2%
ethical application in the real world1
 
0.2%
No one should die1
 
0.2%
Income1
 
0.2%
N/a1
 
0.2%
Other values (31)31
 
6.2%
(Missing)435
87.2%

Length

2022-11-16T16:18:18.286854image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na15
 
6.6%
the13
 
5.8%
none11
 
4.9%
of9
 
4.0%
no7
 
3.1%
study6
 
2.7%
be5
 
2.2%
and5
 
2.2%
to5
 
2.2%
data5
 
2.2%
Other values (116)145
64.2%

Most occurring characters

ValueCountFrequency (%)
165
12.7%
e127
 
9.8%
n106
 
8.2%
t106
 
8.2%
a95
 
7.3%
o92
 
7.1%
i89
 
6.9%
s68
 
5.2%
r54
 
4.2%
c43
 
3.3%
Other values (42)354
27.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1063
81.8%
Space Separator165
 
12.7%
Uppercase Letter47
 
3.6%
Other Punctuation18
 
1.4%
Control3
 
0.2%
Final Punctuation1
 
0.1%
Open Punctuation1
 
0.1%
Close Punctuation1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e127
11.9%
n106
10.0%
t106
10.0%
a95
 
8.9%
o92
 
8.7%
i89
 
8.4%
s68
 
6.4%
r54
 
5.1%
c43
 
4.0%
d42
 
4.0%
Other values (14)241
22.7%
Uppercase Letter
ValueCountFrequency (%)
N14
29.8%
R4
 
8.5%
C4
 
8.5%
I3
 
6.4%
P3
 
6.4%
A3
 
6.4%
U2
 
4.3%
D2
 
4.3%
H2
 
4.3%
W2
 
4.3%
Other values (7)8
17.0%
Other Punctuation
ValueCountFrequency (%)
.10
55.6%
,3
 
16.7%
/2
 
11.1%
?1
 
5.6%
'1
 
5.6%
:1
 
5.6%
Space Separator
ValueCountFrequency (%)
165
100.0%
Control
ValueCountFrequency (%)
3
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1110
85.5%
Common189
 
14.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e127
11.4%
n106
 
9.5%
t106
 
9.5%
a95
 
8.6%
o92
 
8.3%
i89
 
8.0%
s68
 
6.1%
r54
 
4.9%
c43
 
3.9%
d42
 
3.8%
Other values (31)288
25.9%
Common
ValueCountFrequency (%)
165
87.3%
.10
 
5.3%
3
 
1.6%
,3
 
1.6%
/2
 
1.1%
1
 
0.5%
?1
 
0.5%
'1
 
0.5%
(1
 
0.5%
)1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1298
99.9%
Punctuation1
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
165
12.7%
e127
 
9.8%
n106
 
8.2%
t106
 
8.2%
a95
 
7.3%
o92
 
7.1%
i89
 
6.9%
s68
 
5.2%
r54
 
4.2%
c43
 
3.3%
Other values (41)353
27.2%
Punctuation
ValueCountFrequency (%)
1
100.0%

rank_add_fac_3_pos
Categorical

HIGH CORRELATION
MISSING

Distinct13
Distinct (%)14.3%
Missing408
Missing (%)81.8%
Memory size4.0 KiB
10
31 
3
15 
0
11 
1
na
Other values (8)
22 

Length

Max length4
Median length1
Mean length1.494505495
Min length1

Characters and Unicode

Total characters136
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)1.1%

Sample

1st row10
2nd row8
3rd row2
4th rowna
5th row2

Common Values

ValueCountFrequency (%)
1031
 
6.2%
315
 
3.0%
011
 
2.2%
17
 
1.4%
na5
 
1.0%
95
 
1.0%
84
 
0.8%
24
 
0.8%
62
 
0.4%
none2
 
0.4%
Other values (3)5
 
1.0%
(Missing)408
81.8%

Length

2022-11-16T16:18:18.364026image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1031
34.1%
315
16.5%
011
 
12.1%
17
 
7.7%
na7
 
7.7%
95
 
5.5%
84
 
4.4%
24
 
4.4%
62
 
2.2%
none2
 
2.2%
Other values (2)3
 
3.3%

Most occurring characters

ValueCountFrequency (%)
042
30.9%
139
28.7%
315
 
11.0%
n9
 
6.6%
a7
 
5.1%
95
 
3.7%
84
 
2.9%
24
 
2.9%
62
 
1.5%
o2
 
1.5%
Other values (4)7
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number113
83.1%
Lowercase Letter20
 
14.7%
Uppercase Letter3
 
2.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
042
37.2%
139
34.5%
315
 
13.3%
95
 
4.4%
84
 
3.5%
24
 
3.5%
62
 
1.8%
42
 
1.8%
Lowercase Letter
ValueCountFrequency (%)
n9
45.0%
a7
35.0%
o2
 
10.0%
e2
 
10.0%
Uppercase Letter
ValueCountFrequency (%)
N2
66.7%
O1
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common113
83.1%
Latin23
 
16.9%

Most frequent character per script

Common
ValueCountFrequency (%)
042
37.2%
139
34.5%
315
 
13.3%
95
 
4.4%
84
 
3.5%
24
 
3.5%
62
 
1.8%
42
 
1.8%
Latin
ValueCountFrequency (%)
n9
39.1%
a7
30.4%
o2
 
8.7%
e2
 
8.7%
N2
 
8.7%
O1
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII136
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
042
30.9%
139
28.7%
315
 
11.0%
n9
 
6.6%
a7
 
5.1%
95
 
3.7%
84
 
2.9%
24
 
2.9%
62
 
1.5%
o2
 
1.5%
Other values (4)7
 
5.1%

Interactions

2022-11-16T16:18:09.122520image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.362971image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.202374image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.006549image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.021243image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.860368image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.700114image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.536151image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.374375image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.381442image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.270902image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.197075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.444388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.273457image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.079444image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.095753image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.934422image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.775679image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.611325image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.447611image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.454913image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.349354image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.269272image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.517797image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.343229image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.149745image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.167469image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.007360image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.847814image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.685090image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.519110image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.526830image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.425042image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.344669image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.592287image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.415869image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.223064image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.243625image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.083262image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.922362image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.760498image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.593047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.611439image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.504695image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.420392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.667238image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.491999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.298641image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.322149image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.160256image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.000220image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.838192image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.668728image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.692375image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.585862image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.496948image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.746030image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.565824image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.374576image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.399192image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.237907image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.078041image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.915972image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.744792image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.773701image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.662804image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.573993image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.824699image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.640107image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.449874image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.474616image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.314264image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.154599image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.991839image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.820830image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.854678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.739688image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.651381image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.901317image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.713024image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.528798image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.550702image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.391246image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.230216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.067574image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.896131image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.934097image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.815271image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.727830image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:00.975959image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.786842image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.791024image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.627503image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.468979image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.306850image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.143541image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.152328image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.017923image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.892301image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.806813image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.050750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.859812image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.867198image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.703532image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.545419image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.382912image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.221631image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.228750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.099418image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.967832image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.883201image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.125810image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:01.933127image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:02.944097image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:03.783038image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:04.621972image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:05.458876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:06.298396image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:07.305391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:08.184423image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-16T16:18:09.044086image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-16T16:18:18.763422image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-16T16:18:18.977785image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-16T16:18:19.039207image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-16T16:18:19.100605image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-16T16:18:19.190255image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-16T16:18:19.404462image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-16T16:18:10.081249image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-16T16:18:10.988952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-16T16:18:11.251970image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-11-16T16:18:11.485223image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexlatlongsm_useagegender_idethnic_idedupolitic_prefsm_res_purpsm_awaresm_expmt_inerctsm_data_useethic_apprstudy_1_ethic_accstudy_1_concstudy_1_add_infostudy_2_ethic_accstudy_2_concstudy_2_add_infostudy_3_ethic_accstudy_3_concstudy_3_add_infostudy_4_ethic_accstudy_4_concstudy_4_add_infodesign_contdesign_num_usersdesign_res_purpdesign_len_datadesign_admin_interdesign_inter_typedesign_partic_awaredesign_inter_impactdesign_type_datadesign_add_facrank_sci_reprorank_resprank_justrank_anonyrank_harmsrank_balancerank_pub_interstrank_add_fac_1rank_add_fac_1_posrank_add_fac_2rank_add_fac_2_posrank_add_fac_3rank_add_fac_3_pos
0147.6034-122.3414Facebook29MaleAsian - EasternHighschoolSlightly liberalExtremely aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys)Creating fake accounts ("bots"),Secretly changing the content of what users seePolitical elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksThe scope of the project and actions there in do not cross certain boundaries that may purposefully negatively affect participants as well as legal regulations and standard practices.NeutralNaNNaNNeutralNaNNaNNeutralNaNNaNNeutralNaNNaNNot at all importantNot at all importantNot at all importantNot at all importantNot at all importantSlightly importantSlightly importantNot at all importantNot at all importantNo.2756431NaNNaNNaNNaNNaNNaN
1233.058-80.0101Twitter33MaleMixed raceHighschoolNeutral/ Neither conservative or liberalModerately aware… are large and can contain millions of data pointsPrivately messaging users,Publicly posting on users' profiles,Secretly changing the content of what users seePolitical elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksI think Ethical Approval means that the experiment is gathering data without harm or injury to people.Completely acceptableNaNNaNCompletely acceptableNaNNaNCompletely accepatableNaNNaNNeutralNaNNaNNot at all importantNot at all importantNot at all importantNot at all importantNot at all importantNot at all importantModerately importantNot at all importantNot at all importantThe only aspects of social media research that would cause concern for me is saving photographs or imaging data.3526174NaNNaNNaNNaNNaNNaN
2343.2817-71.6595Facebook33FemalePacific IslanderBachelor's degreeVery liberalExtremely aware… are large and can contain millions of data points,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots"),Secretly changing the content of what users seePolitical elections (e.g. voting behavior),Presidential approval ratings,Communication (e.g. spread of opinions and hate-speech),News consumption (e.g. sharing of misinformation),Social networksResearchers focus on ethical standards towards those they gain data from. They need approval of their approach and receive methods.Completely acceptableNo concerns. I would have loved to partake in this study in terms of watching the results.NaNCompletely acceptableGoing to the poster privately provided opportunity for change without the possibly of increased toxicity from users. I prefer this method over commenting the "correct information".NaNSomewhat acceptableI find this is ethical as long as participants were fully aware of what was being monitored. The results are interesting! No concerns.NaNSomewhat unacceptableI am uncertain how I feel completely about a researcher creating a fake account. However I do understand the desire to protect themselves and to not give away their actions as being part of a study. This misinformation needed to be corrected for the public but it opened the original poster to toxicity. The OP may not have known it was incorrect.The researchers had a purpose in seeing the responses of those interacting with the post. I do not agree with how it was done entirely however I do not know a better way to get the results that were desired.Extremely importantVery importantVery importantExtremely importantModerately importantVery importantModerately importantExtremely importantNot at all importantNone that I can think of, other than what has been asked already.7563241NaNaNNaNaNNaNaN
3435.8437-86.3881Facebook73FemaleWhite / CaucasianHighschoolSlightly conservativeModerately aware… are large and can contain millions of data points,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectCreating fake accounts ("bots")Political elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)I would think that using "ethical approval" means that the things others collect on social media sites would need to be honest and moral. Hopefully, there would be no under-handedness used in collecting information.NeutralI feel if people know they are being judged they will act, speak, or write differently than if they don't know they are being analyzed.NaNSomewhat acceptableI feel as though, in the above case, users had a choice to respond or not so I think it was honest.NaNSomewhat acceptableAs long as the Facebook users were informed that they would be in a study I feel it is fair. It was up to the users whether they wanted to participate or not. Also, they were encouraged, but not actually made to Like the Facebook study.NaNSomewhat unacceptableUsers were not aware of what was going on so they were possibly more honest in their opinions because they had no idea they were being analyzed.NaNModerately importantModerately importantExtremely importantVery importantModerately importantVery importantExtremely importantVery importantVery importantReducing any type of hate is always a good thing.7263451Offer results to participants8NaNNaNNaNNaN
4534.7456-92.3419Twitter27FemaleNative-AmericanHighschoolVery liberalExtremely aware… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots")Political elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksA set of rules of what to do and what to not do.Completely acceptableNaNNaNCompletely acceptableNaNNaNCompletely unacceptableThe web extension being used was invasive, even if it was used with consent. The people participating in the study are not educated enough on exactly how much information the web extension was taking.Making the source code for the web extension publicly available to have complete transparency over what the extension was doing.Completely acceptableNaNNaNExtremely importantNot at all importantNot at all importantNot at all importantNot at all importantExtremely importantSlightly importantNot at all importantExtremely importantNaN3152467NaNNaNNaNNaNNaNNaN
5625.6639-80.4372Facebook49FemaleHispanicBachelor's degreeSlightly liberalSlightly aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPublicly posting on users' profiles,Creating fake accounts ("bots"),Hacking into users' accountsPolitical elections (e.g. voting behavior),Health topics (e.g. spread of diseases),Well-being and economic satisfaction,News consumption (e.g. sharing of misinformation),Social networksis when the participants have the right to know who was access to their data and what is being done with it.Somewhat acceptablenanaSomewhat unacceptablenanaCompletely accepatablenanaSomewhat acceptablenanaVery importantModerately importantExtremely importantVery importantVery importantVery importantVery importantVery importantVery importantno7264135NaNNaNNaNNaNNaNNaN
6744.2433-88.3564Facebook53MaleWhite / CaucasianHighschoolSlightly conservativeSlightly aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect,… are unaffected by the way social media platforms workNone of the abovePolitical elections (e.g. voting behavior),Presidential approval ratings,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)Verification of some sort that social media users and/or the data being used is not being skewed to support a theory or the results in any way.Completey unacceptableEasy enough for an outside government to try copying such a study with the sole purpose of creating much more polarization, hate, etc. Not that it hasn't been tried and tested perhaps innumerable times by all types of foreign or domestic entities as far as we know. No actual study would have really been needed to know that using a type of marketing manipulation could alter the recipients mood/levels of concern/anxiety/hate/etc.NaNSomewhat acceptableNaNConcerns over the possibility of the researchers having their own political agenda. Yet fake news is a major problem. What social media really is when mass sharing news (political news), is simple propaganda from the left and right.NeutralThe researchers seem in some ways to try manipulating political viewpoints in a segment of the population for the sake of science.NaNNeutralMany of the people that have large political followings on twitter (and many who don't) often know already the news they are sharing is fake. It's political partisanship and the spreading of propaganda. Some might post fake news only to gain more followers (the masses) if they believe it serves that end.NaNModerately importantExtremely importantSlightly importantVery importantModerately importantExtremely importantExtremely importantExtremely importantVery importantNA. I have already voiced my concerns about researching this in general from this surveys other questions.6753241A full disclosure of any political organizations of which a researcher belongs to or has donated to within a previous time frame (such as 4 yrs).8NaN9NaN10
7842.0307-87.8107Reddit29FemaleWhite / CaucasianHighschoolSlightly liberalModerately aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots")Political elections (e.g. voting behavior),Economic forecasting,Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)Going through a process of peer review maybe? Like earlier you mentioned creating bot accounts, so maybe making sure the researcher isn’t spreading hate or misinformationCompletely acceptableNaNNaNCompletely acceptableNaNNaNCompletely accepatableNaNNaNSomewhat acceptableNaNNaNVery importantVery importantModerately importantExtremely importantModerately importantModerately importantModerately importantExtremely importantVery importantThe possibility of bot accounts spreading misinformation or hate speech just for the purpose of an experiment7654213NaNNaNNaNNaNNaNNaN
8933.8838-118.1261Facebook23MaleWhite / CaucasianBachelor's degreeNeutral/ Neither conservative or liberalModerately aware… are unaffected by the way social media platforms workPublicly posting on users' profilesSocial networksSocial media is a collective term for websites and applications that focus on communication, community-based input, interaction, content-sharing and collaboration.NeutralNaNNaNNeutralNaNNaNNeutralNaNNaNNeutralNaNNaNModerately importantModerately importantModerately importantModerately importantModerately importantModerately importantModerately importantModerately importantModerately importantNo, i didn't anything like that.5472631Increase your visibility6content-sharing5communication8
91032.1453-110.9456Facebook65MaleHispanicHighschoolVery liberalModerately aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Creating fake accounts ("bots")Political elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksWhether or not something goes against someone's right to privacy online.Completely acceptableNaNI would be interested to know what kind of messages they sent the hate speech users that got them to change their minds.Completely acceptableIt's perfectly within someone's right to send someone else a message on any platform, therefore I believe this study was acceptable.NaNCompletely accepatablePeople willingly consented to being part of the research study, so I believe the study was completely acceptable.NaNCompletely acceptableNaNNaNVery importantNot at all importantSlightly importantNot at all importantNot at all importantNot at all importantModerately importantVery importantNot at all importantNone.5374162The researchers should not intrude into the user's personal lives8NaNNaNNaNNaN

Last rows

df_indexlatlongsm_useagegender_idethnic_idedupolitic_prefsm_res_purpsm_awaresm_expmt_inerctsm_data_useethic_apprstudy_1_ethic_accstudy_1_concstudy_1_add_infostudy_2_ethic_accstudy_2_concstudy_2_add_infostudy_3_ethic_accstudy_3_concstudy_3_add_infostudy_4_ethic_accstudy_4_concstudy_4_add_infodesign_contdesign_num_usersdesign_res_purpdesign_len_datadesign_admin_interdesign_inter_typedesign_partic_awaredesign_inter_impactdesign_type_datadesign_add_facrank_sci_reprorank_resprank_justrank_anonyrank_harmsrank_balancerank_pub_interstrank_add_fac_1rank_add_fac_1_posrank_add_fac_2rank_add_fac_2_posrank_add_fac_3rank_add_fac_3_pos
48949037.1235-76.4502Facebook37FemaleWhite / CaucasianMaster's degree or aboveSlightly liberalVery aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectNone of the abovePolitical elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksThat the researchers would use the acquired users data in an ethical manner without manipulating it. Keeping the users data safe and secure.Somewhat acceptableAccounts were anonymous and operated by the human which is fine and the outcome was awesome so I would say that type of research is somewhat acceptable.NaNSomewhat unacceptableCreating fake accounts and sending unsolicited private messages to users is unethical.Not ethically acceptable to me.Completely unacceptablethe researchers tried to bribe and manipulate the social media usersI do not approve this practice by the researchersSomewhat unacceptablecreating fake accounts for research or any other purposes is not acceptable or ethical to me.NaNVery importantVery importantExtremely importantExtremely importantExtremely importantVery importantExtremely importantExtremely importantExtremely importantPrivacy of users data.7312654NaNNaNNaNNaNNaNNaN
49049130.4941-90.4751Twitter44MaleAfrican-AmericanHighschoolSlightly liberalModerately aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots")Political elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksTo be approved by the original source it came from.Completely acceptableThis should be done more often, It's a good thing to do, completely acceptable.Hate speech is a serious issue, we need to do better.Somewhat acceptableI think it's in their best concerns to reduce the amount of misinformation, and also help fact check what's posted.It's acceptable on my behalf due to the researchers posting facts.Somewhat acceptableI can relate to going to another news source to see what information they're giving, and doing this study in this type of way is intriguing.NaNNeutralI'm not sure if this is good, or badNaNExtremely importantExtremely importantExtremely importantExtremely importantVery importantExtremely importantVery importantExtremely importantExtremely importantWho the researchers are targeting on social media sites. Race, sex, job type, political view.2457316NaNNaNNaNNaNNaNNaN
49149226.6054-81.7284Facebook39FemaleWhite / CaucasianHighschoolSlightly conservativeExtremely aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Creating fake accounts ("bots"),Secretly changing the content of what users seePolitical elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation)It means that possible risks have been considered and deemed acceptable.Completey unacceptableParticipants should have the right to accept or decline to participate in the study.NaNCompletely unacceptableParticipants should be made aware of the study and have the option to either accept or decline being included in it.NaNCompletely accepatableNaNNaNCompletely unacceptableThe participants should have been made aware that they were part of a study and either accept or decline taking part in it.NaNVery importantVery importantVery importantExtremely importantModerately importantModerately importantExtremely importantVery importantVery importantnone7142356NaNNaNNaNNaNNaNNaN
49249340.5662-79.7078Reddit54MaleWhite / CaucasianBachelor's degreeNeutral/ Neither conservative or liberalSlightly aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots")Political elections (e.g. voting behavior),Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksAssurance that the experimenters will use the data and information collected only for the purpose explained in the study. Also, that the person being polled is aware of their rights and redresses, if necessary, by a board or body overseeing the researchers. Generally, that the experiment will cause no foreseeable harm to the people being polled.NeutralIt's concerning that the study misrepresented the nature of the anonymous accounts who replied to the message. It's understandable that they wanted sincere reactions to the messages they sent and that informing the recipients they weren't real people could have caused the messages to be disregarded or met with a level of denial, but since there were a range of responses to the original hate speech, I wonder if any of the replies were incendiary, which could cause the original user to get even more emotionally involved, stressed, or angry, which could lead to actual violence or emotional distress. I'd imagine if they were trying to measure how people reacted to different messages they would have to have them grouped into at least Empathetic, Neutral, and Contrary types of messages the researchers sent.It's hard to judge without seeing the actual content of the messages, so I'd want to see that and who is overseeing the study and how closely it's being monitored.Somewhat acceptableSame as the others, that the subjects were unaware of the experiment. I do find the anonymous accounts more acceptable than the human-looking automated accounts.NaNCompletely accepatableNaNIt seems the study was forthcoming and transparent and that participants had to opt in to join it, so I can't see any issues, as long as all other processes are in place (eg. the study is being overseen, etc.)Somewhat acceptableWhile most of the study seems innocuous, for instance, the bot is just replying with a Tweet about fact-checking that the user can choose not to click, it's always concerning when the subjects don't know they're part of an experiment and that the automated accounts were apparently made to look like a human user.I'd want to make sure the study is only using publicly-made Twitter statements and not going any further by looking into other social media sites the user might have linked or any other biographical information that could be discerned.Slightly importantNot at all importantSlightly importantNot at all importantVery importantSlightly importantExtremely importantNot at all importantExtremely importantNaN7256134NaNNaNNaNNaNNaNNaN
49349434.0264-117.936Facebook32MaleWhite / CaucasianMaster's degree or aboveSlightly conservativeVery aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectCreating fake accounts ("bots")Political elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksEthical approval means getting approval from the University or the government or both.Completely acceptableNaNNaNCompletely acceptableNaNSome people may not want to be contacted privately.Completely unacceptableChoosing between the results of their own data or money is completely unacceptable. Why should participants have to pay to view their own data? They created it, so they should have access to it if they want it.NaNCompletely acceptableNaNNaNVery importantVery importantVery importantSlightly importantVery importantExtremely importantVery importantExtremely importantExtremely importantNaN7251436NoneNaNNaNNaNNaNNaN
49449537.2697-81.2212Facebook35FemaleWhite / CaucasianBachelor's degreeNeutral/ Neither conservative or liberalModerately aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectCreating fake accounts ("bots"),Secretly changing the content of what users seePolitical elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Health topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksApproval to do any type of thing that might be deceptive.Completely acceptableNaNNaNSomewhat unacceptableIt seems a little too deceptive to me.NaNCompletely accepatableThey weren't really deceptive.NaNCompletely acceptableNaNNaNModerately importantModerately importantVery importantVery importantVery importantVery importantExtremely importantVery importantVery importantNo.6541372NaNNaNNaNNaNNaNNaN
49549640.5828-73.9532Facebook39MaleWhite / CaucasianMaster's degree or aboveVery conservativeModerately aware… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collect,… are always representative of people’s offline behavior,… are unaffected by the way social media platforms workPublicly posting on users' profilesHealth topics (e.g. spread of diseases),Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksIt has to with researchers taking a mental note of the standards meant to be followed while conducting research experiment.Completely acceptableNaNNaNSomewhat acceptableNaNNaNSomewhat acceptableNaNNaNCompletely acceptableNaNNaNExtremely importantSlightly importantExtremely importantModerately importantSlightly importantExtremely importantExtremely importantSlightly importantVery importantI can't think of any other aspects.7324156NaNNaNNaNNaNNaNNaN
49649740.2602-76.8591Facebook37FemaleAfrican-AmericanHighschoolVery liberalNot at all aware… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectCreating fake accounts ("bots")Political elections (e.g. voting behavior),Communication (e.g. spread of opinions and hate-speech),News consumption (e.g. sharing of misinformation),Social networksI think ethical approval means that institutions have to deem the experiments as tests that most would approve of.Completely acceptableNaNNaNCompletely acceptableNaNNaNCompletely accepatableNaNNaNCompletely acceptableNaNNaNNot at all importantNot at all importantNot at all importantNot at all importantNot at all importantNot at all importantSlightly importantNot at all importantNot at all importantNo7463125NaNNaNNaNNaNNaNNaN
49749840.8275-73.1225Reddit23FemaleAfrican-AmericanHighschoolSlightly liberalSlightly aware… are large and can contain millions of data points,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Publicly posting on users' profilesPolitical elections (e.g. voting behavior),Economic forecasting,Presidential approval ratings,Well-being and economic satisfaction,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksI think ethical approval means that the experiment has to be deemed as appropriate, safe, and not have long-term consequences.Somewhat unacceptableIt is unacceptable that the users were never made aware that it was a study and the researcher analyzed the user's behaviors for weeks.NaNSomewhat unacceptableIt is good that the researchers only examined data that was collected during the experiment period but unacceptable that the users were messaged privately and were not made aware that it was a experiment.NaNSomewhat acceptableIt is good that users were made aware that it was a study and what the users had to do was related to the research topic.NaNNeutralIt is good that the researchers only analyzed the users' behaviors for a short period after the experiment but it is not appropriate that the researchers never told the users that it was an experiment.NaNSlightly importantNot at all importantNot at all importantExtremely importantNot at all importantModerately importantVery importantVery importantModerately importantNaN6273451Long-term effects of the experiment4NaNNaNNaNNaN
49849939.0518-94.4046Twitter55MaleWhite / CaucasianVocational trainingVery liberalModerately aware… are large and can contain millions of data points,… reflect events in real-time and can be collected continuously over time,… are naturalistic in that they do not require researchers to directly interact with research volunteers,… often capture social relationships not found using traditional methods (e.g. surveys),… are readily accessible to researchers and easy to collectPrivately messaging users,Publicly posting on users' profiles,Creating fake accounts ("bots")Political elections (e.g. voting behavior),Presidential approval ratings,Communication (e.g. spread of opinions and hate-speech),Public sentiment (e.g. environment-related concerns),News consumption (e.g. sharing of misinformation),Social networksI think ethical approval is that an academic experiment is run in an ethical way. Meaning that the researchers adhere to ethical standards.Somewhat unacceptableThe fact that participants were not aware they were part of a research study is a concern.NaNSomewhat unacceptableI find it somewhat unacceptable that researchers sent unsolicited private messages.If the researchers had contacted the twitter users instead of sending unsolicited private messages I would probably find it a little more ethical.Completely accepatableNaNNaNSomewhat unacceptableThe users were not informed that they are part of an academic research and deceived by human looking automatic accounts.NaNModerately importantNot at all importantModerately importantModerately importantModerately importantVery importantExtremely importantSlightly importantModerately importantNaN7132564NaNNaNNaNNaNNaNNaN